[Samsung NP530U3C-A01] LID close, AC, and battery status events not produced anymore

Bug #1283589 reported by juanmanuel on 2014-02-22
144
This bug affects 28 people
Affects Status Importance Assigned to Milestone
Linux
Confirmed
High
linux (Ubuntu)
Medium
Unassigned

Bug Description

On my Samsung Series 5 NP530U3C-A01 the LID close, AC, and battery status events not produced anymore. The situation is the same (tested with trusty-desktop-amd64.iso 2014-02-22). I created a blog post where I can put all the information related to this bug here: http://zenstep.com.ar/samsung-linux/

In summary, after a suspend sleep with many events (8 plug/unplug, or battery dropping or increasing about 16%, maybe less), the Embedded Controller accumulates those events and stops producing GPE 0x17. Neither Windows nor Linux query those accumulated events, because the EC doesn't produce GPE 0x17 anymore. The issue persists between restarts and shutdowns, and even after re-suspending/re-resuming. Hitting the reset button through the hole in the back fixes the problem temporarily. That is, until the next unlucky suspend.

IMPORTANT: if the laptop is never ever suspended, the issue never comes back.

To force the issue to happen, you can either:
1) Sleep the computer in linux (by closing the lid or any other means).
2) Unplug from the wall, plug, unplug, plug, unplug, plug, unplug, plug (8 actions or more).
3) Resume from sleep. You'll note that the battery icon is fixed, and unplugging or plugging doesn't update battery status anymore. Also note that the ability to suspend by closing the LID is lost.

OR:
1) echo disable > /sys/firmware/acpi/interrupts/gpe17
2) plug, unplug, plug, unplug, plug, unplug, plug, unplug (8 actions)
3) echo enable > /sys/firmware/acpi/interrupts/gpe17

WORKAROUND: Turn off the computer, unplug it, hit the reset button in the back of the laptop, and then plug it again.

WORKAROUND (kernel patch 1): https://bugzilla.kernel.org/show_bug.cgi?id=44161#c133

BETTER WORKAROUND (kernel patch 2): https://bugzilla.kernel.org/show_bug.cgi?id=44161#c149

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-11-generic 3.13.0-11.31
ProcVersionSignature: Ubuntu 3.13.0-11.31-generic 3.13.3
Uname: Linux 3.13.0-11-generic x86_64
ApportVersion: 2.13.2-0ubuntu5
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: ubuntu 2546 F.... pulseaudio
CasperVersion: 1.337
Date: Sat Feb 22 22:57:34 2014
LiveMediaBuild: Ubuntu 14.04 LTS "Trusty Tahr" - Alpha amd64 (20140222)
MachineType: SAMSUNG ELECTRONICS CO., LTD. 530U3C/530U4C
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: initrd=/casper/initrd.lz file=/cdrom/preseed/hostname.seed boot=casper quiet splash -- BOOT_IMAGE=/casper/vmlinuz.efi
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-11-generic N/A
 linux-backports-modules-3.13.0-11-generic N/A
 linux-firmware 1.125
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/15/2013
dmi.bios.vendor: Phoenix Technologies Ltd.
dmi.bios.version: P14AAJ
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: SAMSUNG_NP1234567890
dmi.board.vendor: SAMSUNG ELECTRONICS CO., LTD.
dmi.board.version: FAB1
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 9
dmi.chassis.vendor: SAMSUNG ELECTRONICS CO., LTD.
dmi.chassis.version: 0.1
dmi.modalias: dmi:bvnPhoenixTechnologiesLtd.:bvrP14AAJ:bd04/15/2013:svnSAMSUNGELECTRONICSCO.,LTD.:pn530U3C/530U4C:pvr0.1:rvnSAMSUNGELECTRONICSCO.,LTD.:rnSAMSUNG_NP1234567890:rvrFAB1:cvnSAMSUNGELECTRONICSCO.,LTD.:ct9:cvr0.1:
dmi.product.name: 530U3C/530U4C
dmi.product.version: 0.1
dmi.sys.vendor: SAMSUNG ELECTRONICS CO., LTD.

While the power supply device (ADP1) seems to always understand when the cord is plugged in and removed, the battery (BAT1) does not seem to know when it is discharging or not.

Samsung Series 9 np900x4b with Fedora 17

$ uname -a
Linux Lethe 3.4.4-3.fc17.x86_64 #1 SMP Tue Jun 26 20:54:56 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

(Results of acpi -i -b -a)
*** AC PLUGGED IN ***
Battery 0: Full, 100%
Battery 0: design capacity 8400 mAh, last full capacity 8500 mAh = 100%
Adapter 0: on-line

*** AC REMOVED ***
Battery 0: Charging, 100%, until charged
Battery 0: design capacity 8400 mAh, last full capacity 8500 mAh = 100%
Adapter 0: off-line

*** System Suspended and woken back up ***

*** AC STILL REMOVED ***
Battery 0: Discharging, 100%, 06:33:49 remaining
Battery 0: design capacity 8400 mAh, last full capacity 8500 mAh = 100%
Adapter 0: off-line

*** AC PLUGGED IN ***
Battery 0: Unknown, 98%
Battery 0: design capacity 8400 mAh, last full capacity 8500 mAh = 100%
Adapter 0: on-line

*** SUSPEND AND WAKE ***

*** AC PLUGGED IN ***
Battery 0: Unknown, 98%
Battery 0: design capacity 8400 mAh, last full capacity 8500 mAh = 100%
Adapter 0: on-line

*** AC REMOVED ***
Battery 0: Charging, 98%, charging at zero rate - will never fully charge.
Battery 0: design capacity 8400 mAh, last full capacity 8500 mAh = 100%
Adapter 0: off-line

The same information is reflected when looking directly at the sysfs enteries.
/sys/class/power_supply/ADP1/online shows the correct status all the time. (1 for online, 0 offline)

/sys/class/power_supply/BAT1/status will show the correct status after a fresh suspend, reflecting the above results.

Created attachment 74581
dmesg

Created attachment 74591
lsmod

Created attachment 74601
lspci

Created attachment 74611
upower -d

Created attachment 74621
ACPI DSDT DSL

The effect this has on my system is that upowerd does not always know if the laptop is running on AC or Battery. This also means that things like the battery indicator in Gnome 3 and tuned do not end up in the right state.

Also affects new Ivy Bridge Samsung Series 9 models (observed on 900X4C). Also tested this using mainline kernels without vendor changes and got the same behavior.

Just in case the NP400x4C (Ivy Bridge version) give different values, here are the results of acpi -i -b -a on this system.

** AC Off **

Battery 0: Discharging, 70%, 03:46:06 remaining
Battery 0: design capacity 8400 mAh, last full capacity 8700 mAh = 100%
Adapter 0: off-line

** AC On **

Battery 0: Discharging, 70%, 01:55:16 remaining
Battery 0: design capacity 8400 mAh, last full capacity 8700 mAh = 100%
Adapter 0: on-line

** Suspend and wake **

** AC Still On **

Battery 0: Charging, 70%, 00:50:05 until charged
Battery 0: design capacity 8400 mAh, last full capacity 8700 mAh = 100%
Adapter 0: on-line

** AC Off **

Battery 0: Charging, 70%, 02:11:35 until charged
Battery 0: design capacity 8400 mAh, last full capacity 8700 mAh = 100%
Adapter 0: off-line

** Suspend and wake in this state **

** AC Off **

Battery 0: Discharging, 70%, 05:06:01 remaining
Battery 0: design capacity 8400 mAh, last full capacity 8700 mAh = 100%
Adapter 0: off-line

** AC On **

Battery 0: Discharging, 70%, 406:00:00 remaining
Battery 0: design capacity 8400 mAh, last full capacity 8700 mAh = 100%
Adapter 0: on-line

What I find quite interesting about these is that the times change as the AC state changes and seem to reflect the "correct" time, i.e. time to discharge when disconnected and time to charged when connected. So something is detecting the change, it's just not propagating to all properties.

Should have said, the above came from an Ubuntu 12.04 machine using mainline kernel packages.

Linux XXXXXX 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Also affected, Samsung 5 Series NP530U3C laptop, same symptoms:

Plugged in:
Battery 0: Charging, 79%, 00:27:20 until charged
Battery 0: design capacity 6100 mAh, last full capacity 5900 mAh = 96%
Adapter 0: on-line

Not plugged in:
Battery 0: Charging, 79%, 01:26:32 until charged
Battery 0: design capacity 6100 mAh, last full capacity 5900 mAh = 96%
Adapter 0: off-line

uname -a
Linux jaytec 3.2.0-27-generic #43-Ubuntu SMP Fri Jul 6 14:25:57 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Also had Ubuntu Quantal 3.5 kernel before wiping and reinstalling 3.2. The bug existed there also.

Created attachment 75961
dmesg

Created attachment 75971
lshw

Created attachment 75981
lsmod

Created attachment 75991
lspci

Created attachment 76001
upower

This for me seems fixed, although my Samsung laptop brightness buttons cause the screen to flicker and thus are useless now :)

uname -a
Linux jaytec 3.2.0-27-generic #43-Ubuntu SMP Fri Jul 6 14:25:57 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Kernel has not updated, not quite sure what update fixed it, possibly a driver update from Ubuntu http://ppa.launchpad.net/ubuntu-x-swat/x-updates/ Ubuntu PPA which offers updated X drivers.

38 comments hidden view all 283 comments
juanmanuel (rockerito99) wrote :
description: updated
juanmanuel (rockerito99) wrote :

Attached DSDT of a Samsung Series 5 NP530U3C ultrabook with the same problem. Bios version is latest: P14AAJ

description: updated
juanmanuel (rockerito99) wrote :

This is the program I made, that "unstucks" the computer so that it can send LID and AC and Battery events again. (it queries the embedded controller queued events, thus unblocking the EC so that it can start sending them again).

Ideally run after resume from sleep, or at any other time.

juanmanuel (rockerito99) wrote :

Script that calls the program found in the other attachment after resume from sleep.

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed

This is a patch created by Lan Tianyu on the kernel bugzilla to do the same that my workaround did, but in a more proper way, and from the kernel:

          https://bugzilla.kernel.org/show_bug.cgi?id=44161#c133

I tested it:

          https://bugzilla.kernel.org/show_bug.cgi?id=44161#c135

and it works.
Lets hope more people test this patch so that it can be included some day in the kernel.

description: updated
juanmanuel (rockerito99) wrote :

More description about this issue found I posted in blog format here:

           http://www.zenstep.com.ar/samsung-laptop
--
Juan Manuel Cabo

juanmanuel (rockerito99) wrote :
tags: added: patch
summary: - LID close and AC and battery status events not produced anymore on
- samsung ultrabook.
+ [Samsung NP530U3C-A01] LID close and AC and battery status events not
+ produced anymore
summary: - [Samsung NP530U3C-A01] LID close and AC and battery status events not
+ [Samsung NP530U3C-A01] LID close, AC, and battery status events not
produced anymore
description: updated
juanmanuel (rockerito99) wrote :

Christopher: you changed my description of the bug, and in doing so, confused the two different ways to force the issue to show up. You listed the second way as a fourth item of the first way, when it is a different way.

Take a look at the original description.
--
Juan Manuel Cabo

tags: added: cherry-pick

Also affect Serie 9 (NP900X3C in my case)

 Description: Ubuntu 13.10
 Release: 13.10

 Ubuntu 3.11.0-17.31-generic 3.11.10.3

juanmanuel workaround works.

Waiting Ubuntu patch

mmalmeida (mmalmeida) wrote :

Alfo affects: Samsung series 9 NP900X4C

Tim Edwards (tkedwards) wrote :

I have a Samsung NP530U3C-A01 laptop and I can confirm this bug affects my laptop. After applying the workaround script from juanmanuel in #3 and #4 I re-tested and the problem no longer appears.

Thanks juanmanuel for writing the fix and being so patient with Ubuntu's ridiculous bug policies - just don't lose your laptop otherwise we'll have to start this whole bug report process again!

«Thanks juanmanuel for writing the fix and being so patient with Ubuntu's ridiculous bug policies»

I think it's more of a one-man trolling issue than a policy problem.

juanmanuel (rockerito99) wrote :

UPDATED v2: Safer because it obtains the EC ports automatically from /proc/ioports, and it micro-pauses between queries, so that the EC returns each event only once. This replaces the little program in #3.
________
This is the program I made, that "unstucks" the computer so that it can send LID and AC and Battery events again. (it queries the embedded controller queued events, thus unblocking the EC so that it can start sending them again).

Ideally run after resume from sleep, or at any other time.

Tim Edwards (tkedwards) wrote :

@andrea-lazzarotto
To be fair it does say "One defect, per person, per hardware, per report" at https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue.

The 'One defect, per person' is obviously stupid when there's >100 of us with the same hardware and same problem, opening 100 bugs isn't going to help anyone. But I wouldn't make it personal against the guy if he's just a volunteer enforcing Canonical's rules.

Tomer (tbrisker) wrote :

@tkedwards
Only that rule was made to prevent multiple bugs being cluttered together in one bug report, not to be taken literally as a means of filling the bug system with hundreds of useless duplicates, and I quote:
"Don't amalgamate every issue and hardware you find a problem in after an update... Instead, do a separate report for each distinct issue, on each hardware. This is how different hardware can have similar problematic symptoms (ex. computer won't boot), but different root causes, and patches that fix the different causes."
Obviously this doesn't mean that every affected user should file a separate report for a common problem - otherwise what's the point of having the "this affects me too" on the top.

juanmanuel (rockerito99) on 2014-02-23
description: updated
Jon Cowell (info-synct) wrote :

I also have a Samsung NP-A530U3C-A01UK laptop that exhibits the same behaviour. Since applying the fix from juanmanul all is now well.

Many thanks juanmanual for identifying the cause for this long running issue and for working diligently to find the fix.

juanmanuel (rockerito99) wrote :

This is a NEW and BETTER patch created by Kieran Clancy on the kernel bugzilla to do the same that my workaround did, but in a more proper way, and from the kernel:

          https://bugzilla.kernel.org/show_bug.cgi?id=44161#c149

I also tested it this weekend, and it works (I forced the issue to happen again, and then saw it being resolved automatically by a kernel compiled with this patch).

The most salient advantage over the previous patch is that it also unstucks the EC when the computer starts (in addition to sleep resume). This is more fool proof and suits more use cases.

It also checks the laptop model.

Lets hope more people test this patch so that it can be included some day in the kernel.

juanmanuel (rockerito99) wrote :

> Many thanks juanmanuel for identifying the cause for this long running issue and for working diligently to find the fix.

You're welcome, it fills me with joy reading all this good feedback!!
--
Juan Manuel Cabo
http://zenstep.com.ar/samsung-linux

Joseph Salisbury (jsalisbury) wrote :

@juanmanuel ,

Do you plan on submitting your patch for inclusion in the mainline kernel?

tags: added: kernel-da-key
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Triaged
juanmanuel (rockerito99) wrote :

Joseph: The best kernel patch so far was made by Kieran Clancy (see post #18 here). His patch is attached to a kernel bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=44161 in comment 149.

I'm just the author of the userspace workaround (post #3, #4, #14) that started it all (see the other issue here https://bugs.launchpad.net/ubuntu/+source/acpi/+bug/971061 comments 102 onwards) . It is a standalone program that unstucks the EC. On the other hand, Kieran's kernel patch now does the same in a cleaner way from inside the kernel.

In my humble opinion, Kieran Clancy's patch can be included in the kernel. I don't know whether Kieran will submit it.

Cheers!
--
Juan Manuel Cabo

juanmanuel (rockerito99) wrote :

Breaking News!

A fix in the form of a kernel patch has now been posted to the linux-acpi and linux-kernel mailing list:

        "[PATCH v2] ACPI / EC: Clear stale EC events on Samsung systems"
        http://marc.info/?l=linux-acpi&m=139359680828880&w=2

That kernel patch was made by Kieran Clancy and tested by me and others.

--
Juan Manuel Cabo
http://zenstep.com.ar/samsung-linux

zeeeeee (miesogeno) wrote :

juanmanuel,
No matter who found the latest/better solution, you are truly a hero.
After reading all the associated bug reports, I decided to use your script because I don't know how to compile the kernel, and I presume I would have to redo it every time the kernel got an update.

My laptop model is NP530U4C, finally with a normal behavior.
Gracias!

220 comments hidden view all 283 comments

Lv,

Sorry for the confusion. I am trying to clear the clouds:

1. Power plug detection is no longer a problem once it started working again. I am unable to make it fail on purpose, so I can't test if any kernel fixes that.

2. The lid switch is now non-functional with any kernel (state is always "open"):
- 3.17
- 3.18
- linux-pm /linux-next + gpe-handle21.patch + ec-flush{1,2,3,4,5,6,7}
- linux-pm /linux-next + gpe-handle21.patch + ec-flush{1,2,3,4,5,6} + ec-flush7-updated
However during ~40 lid open/close cycle with the latter kernel it worked 2 times (state = "closed").

3. suspend: doesn't work by closing the lid, works fine by suspending manually (KDE start menu)

4. resume: always works, but there is a backlight regression which has nothing to do with this issue. See https://bugs.freedesktop.org/show_bug.cgi?id=80773

Don't worry about my kernel building skills. I do that very very often and have established a workflow that always works. They are all straight from git, hand-configured per the specific hardware and use no modules. They are loaded by a hand-configured grub2 straight without initramfs.

Hi, San

Thanks for your help.

(In reply to San from comment #220)
> Hi!
>
> I'm not sure what i have to do. Some time ago i made kernel and uploaded it
> onto my server.
> I would like to do it again. So I'm now compiling this way:
>
> git clone git://kernel.ubuntu.com/ubuntu/ubuntu-utopic.git
> cd ubuntu-utopic/
> patch -p1 < ../155351.patch
> make mrproper
> wget http://beta.paragrafówka.pl/eksport/config-3.16.3 && mv config-3.16.3
> .config
> fakeroot debian/rules clean && fakeroot debian/rules binary-headers
> binary-generic
>
> I'm building as suggested in comment 206
> (https://bugzilla.kernel.org/show_bug.cgi?id=44161#c206) for now and then
> I'll try whole set (as suggested in second point).
>
> Is that right way?

Ortwin is providing me the testing using private email.
I'll post some of the results here after the progress are made for the investigation.
Maybe you can hold a while until the problem is correctly figured out.

The 2 commits are now reverted in linux-pm, it will appear in the Linux mainline in 1 month.

Best regards

Hi, Ortwin

(In reply to Ortwin Glück from comment #221)
> Lv,
>
> Sorry for the confusion. I am trying to clear the clouds:
>
> 1. Power plug detection is no longer a problem once it started working
> again. I am unable to make it fail on purpose, so I can't test if any kernel
> fixes that.

So the 2 commits didn't make a regression on the power plug?

> 2. The lid switch is now non-functional with any kernel (state is always
> "open"):
> - 3.17
> - 3.18
> - linux-pm /linux-next + gpe-handle21.patch + ec-flush{1,2,3,4,5,6,7}
> - linux-pm /linux-next + gpe-handle21.patch + ec-flush{1,2,3,4,5,6} +
> ec-flush7-updated
> However during ~40 lid open/close cycle with the latter kernel it worked 2
> times (state = "closed").
>
> 3. suspend: doesn't work by closing the lid, works fine by suspending
> manually (KDE start menu)
>
> 4. resume: always works, but there is a backlight regression which has
> nothing to do with this issue. See
> https://bugs.freedesktop.org/show_bug.cgi?id=80773

Can you perform a resume by opening the LID?
Also please upload the output of /proc/acpi/wakeup here.

> Don't worry about my kernel building skills. I do that very very often and
> have established a workflow that always works. They are all straight from
> git, hand-configured per the specific hardware and use no modules. They are
> loaded by a hand-configured grub2 straight without initramfs.

Good to know this!
That makes every testing possible. :-)

Thanks and best regards
-Lv

Created attachment 157011
proc/acpi/wakeup

I was finally able to do some tests, here are my results.
- Samsung NP900X3C
- using linux-pm/linux-next with patches from Comment 219.

Upon boot, I get the following behavior:
- Power plug detection appears to be working no matter what.
- Lid switch is recognized (open/close states are reported correctly using acpi_listen), but suspend does NOT trigger.

However, after a while, (maybe after a few ACPI events?), the LID switch actually triggers suspend correctly; I'm not sure the issue is related.
It does not seem to revert to the broken behavior even after some time. It works both on battery and AC power.

Opening the LID does not trigger a resume, but tbh that was never the case on this machine.

Lv, if you have more tests in mind / need any logs just let me know.

Attaching my /proc/acpi/wakeup.

Thanks,
Andrea

I'm unable to compile kernel. It patches ok, but i get:

== code ==
  Using /home/publiczny2/kernel/ubuntu-utopic as source for kernel
  /home/publiczny2/kernel/ubuntu-utopic is not clean, please run 'make mrproper'
  in the '/home/publiczny2/kernel/ubuntu-utopic' directory.
==/code ==

Hi,

(In reply to Andrea Fagiani from comment #224)
> Created attachment 157011 [details]
> proc/acpi/wakeup
>
> I was finally able to do some tests, here are my results.
> - Samsung NP900X3C
> - using linux-pm/linux-next with patches from Comment 219.
>
> Upon boot, I get the following behavior:
> - Power plug detection appears to be working no matter what.
> - Lid switch is recognized (open/close states are reported correctly using
> acpi_listen), but suspend does NOT trigger.

Can you confirm if they still work after suspend/resume?
See test requirement A below.

> However, after a while, (maybe after a few ACPI events?), the LID switch
> actually triggers suspend correctly; I'm not sure the issue is related.
> It does not seem to revert to the broken behavior even after some time. It
> works both on battery and AC power.
>
> Opening the LID does not trigger a resume, but tbh that was never the case
> on this machine.

According to the /proc/acpi/wakeup - yes, only power button can wakeup the system.

> Lv, if you have more tests in mind / need any logs just let me know.

A. We want to confirm that this set of patches can fix old issues. We need to be careful before removing old quirks.
1. using linux-pm/linux-next with patches from Comment 219.
2. build/boot the kernel
3. do a suspend/resume
Can power plug and LID switch still work after resuming?

B. We want to see some proofs that Samsung EC firmware acts exactly as what we've inferred from the reversion requirements. So could you:
1. using linux-pm/linux-next with patches from Comment 219.
2. Uncomment the following line in the drivers/acpi/ec.c:
     /* #define DEBUG */
3. build/boot the kernel
4. do some LID state changes and make sure they are captured by acpi_listen.
5. upload the dmesg output here.

C. We want to confirm if the timely polling support is really required by Samsung EC firmware:
1. using linux-pm/linux-next
3. git am gpe-handle21.patch (attachment 156591 [details])
4. git am ec-flush1.patch (attachment 155351 [details])
5. git am ec-flush2.patch (attachment 155371 [details])
6. git am ec-flush3.patch (attachment 155381 [details])
7. git am ec-flush4.patch (attachment 155391 [details])
8. git am ec-flush5.patch (attachment 155411 [details])
9. git am ec-flush6.patch (attachment 155421 [details])
Do not apply the ec-flush7.patch, can power plug and LID switch still work, and can still work even after resuming?

D. We want to confirm if this is really a regression and current fix can:
1. Using linux-pm/linux-next
2. git revert 79149001
3. git revert df9ff918
Do not apply any of the posted patches, can power plug and LID switch still work? And
1. Using linux-pm/linux-next
Do not revert the "regression fix" commits and do not apply any of the posted patches, can power plug and LID switch still work?

Thanks in advance.

> A. We want to confirm that this set of patches can fix old issues. We need
> to be careful before removing old quirks.
> 1. using linux-pm/linux-next with patches from Comment 219.
> 2. build/boot the kernel
> 3. do a suspend/resume
> Can power plug and LID switch still work after resuming?

Yes, both power plug and LID switch work and are correctly captured by acpi_listen.

> B. We want to see some proofs that Samsung EC firmware acts exactly as what
> we've inferred from the reversion requirements. So could you:
> 1. using linux-pm/linux-next with patches from Comment 219.
> 2. Uncomment the following line in the drivers/acpi/ec.c:
> /* #define DEBUG */
> 3. build/boot the kernel
> 4. do some LID state changes and make sure they are captured by acpi_listen.
> 5. upload the dmesg output here.

Opened/closed 5 times, all of them showed up in acpi_listen.
Attaching dmesg.

> C. We want to confirm if the timely polling support is really required by
> Samsung EC firmware:
> 1. using linux-pm/linux-next
> 3. git am gpe-handle21.patch (attachment 156591 [details])
> 4. git am ec-flush1.patch (attachment 155351 [details])
> 5. git am ec-flush2.patch (attachment 155371 [details])
> 6. git am ec-flush3.patch (attachment 155381 [details])
> 7. git am ec-flush4.patch (attachment 155391 [details])
> 8. git am ec-flush5.patch (attachment 155411 [details])
> 9. git am ec-flush6.patch (attachment 155421 [details])
> Do not apply the ec-flush7.patch, can power plug and LID switch still work,
> and can still work even after resuming?

Both work fine, even after a suspend/resume.

Note that the ec-flush patches depend on the reverts, so I applied them even if they weren't explicitly listed. I can test without them later today if required.

> D. We want to confirm if this is really a regression and current fix can:
> 1. Using linux-pm/linux-next
> 2. git revert 79149001
> 3. git revert df9ff918
> Do not apply any of the posted patches, can power plug and LID switch still
> work? And

Upon boot, it works correctly. After a suspend/resume, power plug detection does not work anymore (it is greatly delayed) and neither does the LID switch (doesn't trigger at all).

> 1. Using linux-pm/linux-next
> Do not revert the "regression fix" commits and do not apply any of the
> posted patches, can power plug and LID switch still work?

I wasn't able to do this last test due to time constraints, I'll try and do that later today.

Created attachment 157111
dmesg - #DEBUG enabled

Download full text (3.6 KiB)

Hi,

The test results are very useful for us to understand this issue. :-)

(In reply to Andrea Fagiani from comment #227)
> > B. We want to see some proofs that Samsung EC firmware acts exactly as what
> > we've inferred from the reversion requirements. So could you:
> > 1. using linux-pm/linux-next with patches from Comment 219.
> > 2. Uncomment the following line in the drivers/acpi/ec.c:
> > /* #define DEBUG */
> > 3. build/boot the kernel
> > 4. do some LID state changes and make sure they are captured by
> acpi_listen.
> > 5. upload the dmesg output here.
>
> Opened/closed 5 times, all of them showed up in acpi_listen.
> Attaching dmesg.

Yes. I also found the 0x5E (5 times) and 0x5F (4 times) in the dmesg.
TBH, I didn't find proofs for comment 210 in it.
So the issue isn't as what I imagined before.

> > C. We want to confirm if the timely polling support is really required by
> > Samsung EC firmware:
> > 1. using linux-pm/linux-next
> > 3. git am gpe-handle21.patch (attachment 156591 [details])
> > 4. git am ec-flush1.patch (attachment 155351 [details])
> > 5. git am ec-flush2.patch (attachment 155371 [details])
> > 6. git am ec-flush3.patch (attachment 155381 [details])
> > 7. git am ec-flush4.patch (attachment 155391 [details])
> > 8. git am ec-flush5.patch (attachment 155411 [details])
> > 9. git am ec-flush6.patch (attachment 155421 [details])
> > Do not apply the ec-flush7.patch, can power plug and LID switch still work,
> > and can still work even after resuming?
>
> Both work fine, even after a suspend/resume.

Thus, in order to fix Samsung issue.
Either we need timely polling quirk (since test A works)
Or we need old CLEAR_ON_RESUME quirk (since test C works).

> Note that the ec-flush patches depend on the reverts, so I applied them even
> if they weren't explicitly listed. I can test without them later today if
> required.

Not necessary :-). I only want the kernel test results __wih__ the reverts (as they conflict with the patches posted here) so I don't need to post another set of rebased patches for current linux-pm/linux-next.

> > D. We want to confirm if this is really a regression and current fix can:
> > 1. Using linux-pm/linux-next
> > 2. git revert 79149001
> > 3. git revert df9ff918
> > Do not apply any of the posted patches, can power plug and LID switch still
> > work? And
>
> Upon boot, it works correctly. After a suspend/resume, power plug detection
> does not work anymore (it is greatly delayed) and neither does the LID
> switch (doesn't trigger at all).

Thus the regression seems to be: "the 2 commits have broken the CLEAR_ON_RESUME quirk".

I guess SCI_EVT is set after resuming but GPE doesn't fire.
If this is the case, this isn't a silicon/firmware bug but a driver bug.
As EC GPE is edge triggered and GPE might have already been fired before suspending.
If so, we need to improve the driver and shouldn't fix the driver bug using any kind of quirk.

So I want to check if the SCI_EVT is set after resuming.
Let me think about a way to confirm this.
I'll add another test later.

> > 1. Using linux-pm/linux-next
> > Do not revert the "regression fix" commits and do not apply any of the
> > posted patches...

Read more...

Hi, Andrea

Please use either of the following working combinations:
1. test A
2. test C
3. test D2 (which you haven't done yet)

If you'll do test D2 for us, you can just do it and capture the confirmation information at same time.

Please do:
1. suspend
2. do some LID opening/closing to trigger old issue
3. resume
4. dmesg > dmesg-right-after-resume.txt

And post the dmesg-right-after-resume.txt here.
If the kernel log buffer size is not big enough to contain all messages right after resuming, please increase it using the kernel boot parameter - log_buf_len=1M.

Thanks and best regards

Hi, Andrea

Sorry that I forgot to ask you to enable the EC log to capture the debug information.

2. Uncomment the following line in the drivers/acpi/ec.c:
     /* #define DEBUG */

(In reply to Lv Zheng from comment #230)
> Hi, Andrea
>
> Please use either of the following working combinations:
> 1. test A
> 2. test C
> 3. test D2 (which you haven't done yet)
>
> If you'll do test D2 for us, you can just do it and capture the confirmation
> information at same time.
>
> Please do:
> 1. suspend
> 2. do some LID opening/closing to trigger old issue
> 3. resume
> 4. dmesg > dmesg-right-after-resume.txt
>
> And post the dmesg-right-after-resume.txt here.
> If the kernel log buffer size is not big enough to contain all messages
> right after resuming, please increase it using the kernel boot parameter -
> log_buf_len=1M.

Thanks
-Lv

Created attachment 157351
dmesg-right-after-resume_A

Created attachment 157361
dmesg-right-after-resume_C

Created attachment 157371
dmesg-right-after-resume_D2

Hi Lv,
I captured and uploaded the dmesg as requested.

Concerning test D2, both power plug and LID switch seem to work fine even after a suspend/resume. You'll notice I've played with it a little bit before performing the suspend test.

Hope it helps,
Andrea

Download full text (4.0 KiB)

Hi, Andrea

The results make something clear to me.
Thanks for the great job.

For the 3 test results, SCI_EVT is always flagged after resuming:
Test A (regression fixes + cleanup + timely polling quirk):
[ 42.608720] PM: Restoring platform NVS memory
...
[ 42.649355] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
Test C (regression fixes + cleanup + CLEAR_ON_RESUME quirk):
[ 21.764941] PM: Restoring platform NVS memory
...
[ 21.811916] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
Test D2 (regression fixes + CLEAR_ON_RESUME quirk):
[ 172.102165] PM: Restoring platform NVS memory
...
[ 172.149860] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
This can explain the cause of the original bug identified in comment 129.
The EC behavior is not broekn here. We shouldn't implement a quirk for it since this is a driver bug - if the EC GPE is not fired, we don't have means to poll the events.

For the regression reported in comment 184, I only find the proof in test A that we need both the 2 commits to be reverted:
Test A (regression fixes + cleanup + timely polling quirk):
[ 42.662695] ACPI : EC: ***** Command(QR_EC) started *****
[ 42.662697] ACPI : EC: ===== TASK (2) =====
[ 42.662702] ACPI : EC: EC_SC(R) = 0x00 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=0
[ 42.662703] ACPI : EC: EC_SC(W) = 0x84
[ 42.662707] ACPI : EC: ***** Event running *****
[ 42.662827] ACPI : EC: ##### Query(0x5e) stopped #####
[ 42.666037] ACPI : EC: ===== TASK (2) =====
[ 42.666042] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[ 42.666045] ACPI : EC: EC_DATA(R) = 0x5f
[ 42.666046] ACPI : EC: ***** Command(QR_EC) hardware completion *****
[ 42.666047] ACPI : EC: ***** Command(QR_EC) stopped *****
The event of 0x5F is returned when SCI_EVT=0.
So we need to revert the 2 commits to allow QR_EC to be sent when SCI_EVT=0 and there is really non 0x00 events need to be drained when SCI_EVT=0.
TBH, the EC behavior is broekn here, thus IMO this is a different issue than the one identified in comment 129.
It seems both the quirks (drain way of CLEAR_ON_RESUME, timely polling way) can work it around. If this issue can be reproduced during runtime, then the CLEAR_ON_RESUME can not work it around at that time. According to your tests, the problem cannot be easily reproduced during runtime. So we can stay unchanged until someone else blaims.

If we have to assume this is only a problem happened during resuming and if we want to root cause why the SCI_EVT is suddently cleared after resuming, we will have to assume that something during the resuming steps has caused this issue because:
1. In test C, 2 stale events are returned with SCI_EVT=1:
[ 21.814019] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1
[ 21.814022] ACPI : EC: EC_DATA(R) = 0x5e
[ 21.817333] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1
[ 21.817351] ACPI : EC: EC_DATA(R) = 0x5f
So SCI_EVT will not be spontaneously cleared during acpi_ec_unblock_transactions().
2. In test A, 2 stale events are returned with different SCI_EVT value:
[ 42.656421] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF...

Read more...

Hi, Andrea

I checked the log again. And the following seems to be wrong.

> It seems both the quirks (drain way of CLEAR_ON_RESUME, timely polling way)
> can work it around. If this issue can be reproduced during runtime, then the
> CLEAR_ON_RESUME can not work it around at that time. According to your
> tests, the problem cannot be easily reproduced during runtime. So we can
> stay unchanged until someone else blaims.

Though you didn't face real functional issues, there are following entries in the test D2 log:
[ 31.072607] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[ 31.072613] ACPI : EC: EC_DATA(R) = 0x66
[ 77.835639] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[ 77.835647] ACPI : EC: EC_DATA(R) = 0x66
[ 79.939044] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[ 79.939053] ACPI : EC: EC_DATA(R) = 0x66
[ 82.006084] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1
[ 82.006087] ACPI : EC: EC_DATA(R) = 0x66
Thanks for you long time test or we won't find them out.
The _Q66 is as follows:
                    Method (_Q66, 0, NotSerialized)
                    {
                        P8XH (Zero, 0x66)
                        If (LEqual (B1EX, One))
                        {
                            Notify (BAT1, 0x80)
                        }
                    }
It updates battery status.
Then it means the assumption in comment 210 might be real and we need the "timely polling quirk" for samsung to have everything wroked around...
So I'll prepare the whole patchset posted here for upstream according to this result.

In test A, I can even see such log entries:
[ 1.788647] ACPI : EC: ***** Command(QR_EC) started *****
[ 1.788651] ACPI : EC: ===== TASK (1) =====
[ 1.788656] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0
[ 1.788659] ACPI : EC: EC_SC(W) = 0x84
...
[ 1.800289] ACPI : EC: ===== IRQ (0) =====
[ 1.800295] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1
[ 1.800300] ACPI : EC: EC_DATA(R) = 0x70
...
[ 1.800385] ACPI : EC: ***** Command(QR_EC) stopped *****
It seems for Samsung EC, the firmware can even return event that happens after we send the query command. There can really be a big time slot between driver pushes the command byte and firmware fetches it.

Thanks and best regards

Hi Lv,

Thanks for all your work on this issue. It's certainly not easy, especially if you don't have the hardware available to test on yourself.

You are probably right about the racy behaviour. One important point though, is that you would need to miss either 8 (or was it 16?) events in a row during run time for the buffer to fill and for SCI_EVT to stop being given entirely.

Under normal usage, I think there's almost always going to be an event regularly enough that this "full" condition never occurs. It was only found to be a problem during suspend, where the EC controller seemed to continue logging, and after enough battery level or AC adapter changes it would stop triggering events even after resume.

My HDD was too full recently to build kernels, but I have just freed some space (a bit late, sorry) so if you want me to test anything please let me know.

Download full text (3.4 KiB)

(In reply to Kieran Clancy from comment #238)
> Hi Lv,
>
> Thanks for all your work on this issue. It's certainly not easy, especially
> if you don't have the hardware available to test on yourself.
>
> You are probably right about the racy behaviour. One important point though,
> is that you would need to miss either 8 (or was it 16?) events in a row
> during run time for the buffer to fill and for SCI_EVT to stop being given
> entirely.
>
> Under normal usage, I think there's almost always going to be an event
> regularly enough that this "full" condition never occurs. It was only found
> to be a problem during suspend, where the EC controller seemed to continue
> logging, and after enough battery level or AC adapter changes it would stop
> triggering events even after resume.
>
> My HDD was too full recently to build kernels, but I have just freed some
> space (a bit late, sorry) so if you want me to test anything please let me
> know.

Hi,

Thanks a lot!
We failed to find this platform selling on China market.

I was thinking this issue doesn't exist during runtime. That's why I said in comment 236 that it might be a BIOS _WAK issue or something in acpi_hw_legacy_wake() has cleared SCI_EVT and we may need further investigation. But finally in comment 237, I found it occurred not only after resuming, but also runtime, so further investigation doesn't matter any more because this is definitely an EC firmware behavior according to the facts.

In our driver, we will queue 2nd QR_EC after sending the 1st QR_EC and before fetching the returned event. At that time, SCI_EVT should still be 1, so we can always send 2 QR_EC for 1 SCI_EVT=1 indication. This somewhat can reduce the reproduction ratio of the Samsung EC issue and that's why during runtime it can hardly be reproduced as we can drain events faster than they are accumulated in the firmware.

But the above code can only make a happen to work environment. If the host acts a bit slower than the target, this issue might still occurr during runtime when there are more than 3 events queued up and the driver will be unable to queue 3rd QR_EC if SCI_EVT is cleared after fetching the 1st event.

It seems the best way for Samsung is always sending QR_EC whatever SCI_EVT is until 0x00 returned. This driver behavior is reasonable from ACPI spec's point of view. The CLEAR_ON_RESUME quirk provided by you did this after resuming, we may need to extend this behavior to runtime.

But we do have many platforms act differently:
1. Some platforms never return 0x00 even when SCI_EVT=0, they return certain event value. Some of them even return the event value for which there is no _Qxx prepared in the namespace. So if we continously send QR_EC until 0x00 is returned, the process will never end on such platforms. The users of such platforms will sure blame us.
2. Some platforms (Acer) do not return anything if SCI_EVT=0, the EC query transaction will be blocked, and our driver cannot issue further EC transaction unless previous one is completed. So if we allow QR_EC to be sent without checking SCI_EVT, users of such platforms will complain.

What I'm going to do is:
1. extending the draining behavior - poll...

Read more...

Hi, Kieran

Can you:
1. use current linux-pm/linux-next branch
2. enable EC debugging
3. disable quirk
4. trigger the bug
5
. capture the dmesg right after resume
6. upload the dmesg here
for investigation?

Thanks in advance

Created attachment 157601
dmesg on linux-next without quirk showing EC working before but not after triggering bug

(In reply to Lv Zheng from comment #240)
> Hi, Kieran
>
> Can you:
> 1. use current linux-pm/linux-next branch
> 2. enable EC debugging

Is it enough to just uncomment the #define DEBUG?

> 3. disable quirk

Is it enough to prevent EC_FLAGS_CLEAR_ON_RESUME from being set? Is this what you meant?

> 4. trigger the bug
> 5. capture the dmesg right after resume
> 6. upload the dmesg here

See attached. I annotated the dmesg with a few extra > /dev/kmsg lines starting with 'KC'. They are:

[ 107.787915] KC just booted and logged in
[ 137.240937] KC screen close/open works as intended
[ 203.943414] KC about to suspend, but not trigger bug this time
[ 226.584995] KC back from suspend
[ 265.000194] KC screen close/open works as intended
[ 298.608956] KC AC change detected as intended
[ 318.874279] KC about to suspend and trigger bug
[ 348.834217] KC back from suspend
[ 374.777099] KC screen open/close not detected
[ 398.000040] KC AC change not detected
[ 425.733615] KC about to manually clear EC events with script
[ 442.663539] KC EC events cleared
[ 469.766095] KC AC change detected again
[ 496.219180] KC screen open/close detected again

So between 0 and 318 you can see the EC operating as intended including one suspend cycle.

After 318 I trigger the bug (by unplugging AC a bunch of times), and after that you can see that there is nothing at all logged when I either closed the screen or unplugged AC.

At 425 I manually simulated the quirk by sending 0x84 EC command queries and reading the data until the data is 0x00 using Juan Manuel's userspace program.

After that, EC events are detected properly again.

----

My analysis is below:

After the bug is triggered, SCI_EVT=1 is set just ONE time, immediately after resume:

[ 337.319012] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0

It does not seem as though we ever handle this event properly though. Namely, there seems to be no corresponding "EC_SC(W) = 0x84". There is a couple of "EC_DATA(W) = 0x84" but I'm pretty sure these are totally different?

----

Further testing shows that this SCI_EVT=1 happens for the first resume after the bug is triggered, but not for the second or subsequent resumes.

That means that we are still going to need a wakeup quirk because if for some reason we fail to clear the EC state before the next suspend, we will never get another SCI_EVT=1 (even after a power cycle, I believe).

Created attachment 157611
dmesg on linux-next without quirk showing bug persisting over several suspend cycles

Of the three suspend/resume cycles shown in this dmesg, I trigger the EC bug during the first suspend time.

It comes back with SCI_EVT=1 set the first time, but the 2nd and 3rd resumes do not have this. In a moment I will confirm if this persists across power cycles.

It seems the right behaviour for affected Samsung machines is to send QR_EC until data is 0x00 not just if we get SCI_EVT=1, but additionally on boot or resume.

Created attachment 157631
dmesg on linux-next without quirk showing bug persisting over power cycle

Confirming that once the bug is initially triggered, we don't get SCI_EVT=1 even on a power cycle, so we still need the boot/resume quirk. At least, that's my interpretation.

I hope the dmesgs were helpful. Unfortunately I'm not going to have internet access for the next week (going to the outback where there is not even any mobile reception), but I'm happy to test more things when I return.

Download full text (3.5 KiB)

(In reply to Kieran Clancy from comment #241)
> Created attachment 157601 [details]
> dmesg on linux-next without quirk showing EC working before but not after
> triggering bug
>
> (In reply to Lv Zheng from comment #240)
> > Hi, Kieran
> >
> > Can you:
> > 1. use current linux-pm/linux-next branch
> > 2. enable EC debugging
>
> Is it enough to just uncomment the #define DEBUG?

Yes.

> > 3. disable quirk
>
> Is it enough to prevent EC_FLAGS_CLEAR_ON_RESUME from being set? Is this
> what you meant?

Yes.

> > 4. trigger the bug
> > 5. capture the dmesg right after resume
> > 6. upload the dmesg here
>
> See attached. I annotated the dmesg with a few extra > /dev/kmsg lines
> starting with 'KC'. They are:
>
> [ 107.787915] KC just booted and logged in
> [ 137.240937] KC screen close/open works as intended
> [ 203.943414] KC about to suspend, but not trigger bug this time
> [ 226.584995] KC back from suspend
> [ 265.000194] KC screen close/open works as intended
> [ 298.608956] KC AC change detected as intended
> [ 318.874279] KC about to suspend and trigger bug
> [ 348.834217] KC back from suspend
> [ 374.777099] KC screen open/close not detected
> [ 398.000040] KC AC change not detected
> [ 425.733615] KC about to manually clear EC events with script
> [ 442.663539] KC EC events cleared
> [ 469.766095] KC AC change detected again
> [ 496.219180] KC screen open/close detected again

Great information!
Thanks.

> So between 0 and 318 you can see the EC operating as intended including one
> suspend cycle.
>
> After 318 I trigger the bug (by unplugging AC a bunch of times), and after
> that you can see that there is nothing at all logged when I either closed
> the screen or unplugged AC.
>
> At 425 I manually simulated the quirk by sending 0x84 EC command queries and
> reading the data until the data is 0x00 using Juan Manuel's userspace
> program.
>
> After that, EC events are detected properly again.

Great test cases!
Thanks.

> My analysis is below:
>
> After the bug is triggered, SCI_EVT=1 is set just ONE time, immediately
> after resume:
>
> [ 337.319012] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
>
> It does not seem as though we ever handle this event properly though.
> Namely, there seems to be no corresponding "EC_SC(W) = 0x84". There is a
> couple of "EC_DATA(W) = 0x84" but I'm pretty sure these are totally
> different?

This is a bug, in ec_poll(), there is no code to check SCI_EVT.
And event will thus be lost.

I think this has been fixed in the ec-flush6.patch.
+out:
+ if (status & ACPI_EC_FLAG_SCI &&
+ (!t || t->flags & ACPI_EC_COMMAND_COMPLETE))
+ __acpi_ec_set_event(ec);

We will have this flag checked in the advance_transaction.
So you won't see this with ec-flush[1-6].patch applied.

> Further testing shows that this SCI_EVT=1 happens for the first resume after
> the bug is triggered, but not for the second or subsequent resumes.
>
> That means that we are still going to need a wakeup quirk because if for
> some reason we fail to clear the EC state before the next suspend, we will
> never get another SCI_EVT=1 (even after a power cycle, I believe).

Yes, I think this is requi...

Read more...

(In reply to Kieran Clancy from comment #242)
> Created attachment 157611 [details]
> dmesg on linux-next without quirk showing bug persisting over several
> suspend cycles
>
> Of the three suspend/resume cycles shown in this dmesg, I trigger the EC bug
> during the first suspend time.
>
> It comes back with SCI_EVT=1 set the first time, but the 2nd and 3rd resumes
> do not have this. In a moment I will confirm if this persists across power
> cycles.
>
> It seems the right behaviour for affected Samsung machines is to send QR_EC
> until data is 0x00 not just if we get SCI_EVT=1, but additionally on boot or
> resume.

Yes.
I was planning to support this in this way.

Thanks

(In reply to Kieran Clancy from comment #243)
> Created attachment 157631 [details]
> dmesg on linux-next without quirk showing bug persisting over power cycle
>
> Confirming that once the bug is initially triggered, we don't get SCI_EVT=1
> even on a power cycle, so we still need the boot/resume quirk. At least,
> that's my interpretation.
>
> I hope the dmesgs were helpful. Unfortunately I'm not going to have internet
> access for the next week (going to the outback where there is not even any
> mobile reception), but I'm happy to test more things when I return.

Yes, it's very helpful.
Thanks for the help!

Best regards
-Lv

(In reply to Lv Zheng from comment #244)
> (In reply to Kieran Clancy from comment #241)
> > My analysis is below:
> >
> > After the bug is triggered, SCI_EVT=1 is set just ONE time, immediately
> > after resume:
> >
> > [ 337.319012] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0
> OBF=0
> >
> > It does not seem as though we ever handle this event properly though.
> > Namely, there seems to be no corresponding "EC_SC(W) = 0x84". There is a
> > couple of "EC_DATA(W) = 0x84" but I'm pretty sure these are totally
> > different?
>
> This is a bug, in ec_poll(), there is no code to check SCI_EVT.
> And event will thus be lost.
>
> I think this has been fixed in the ec-flush6.patch.
> +out:
> + if (status & ACPI_EC_FLAG_SCI &&
> + (!t || t->flags & ACPI_EC_COMMAND_COMPLETE))
> + __acpi_ec_set_event(ec);
>
> We will have this flag checked in the advance_transaction.
> So you won't see this with ec-flush[1-6].patch applied.

I should take this back. :-)
This still need to be improved to indicate SCI_EVT even when there is a transaction running.

Thanks
-Lv

commit 74443bbed72ab22ee005ecb6ecdc657a8018e1db
Author: Lv Zheng <email address hidden>
Date: Wed Jan 14 19:28:47 2015 +0800

    ACPI / EC: Fix issues related to the SCI_EVT handling

shipped in Linux-4.0-rc1
closed.

commit 4c237371f290d1ed3b2071dd43554362137b1cce
Author: Lv Zheng <email address hidden>
Date: Wed Jan 4 11:17:17 2017 +0800

    ACPI / EC: Remove old CLEAR_ON_RESUME quirk

The above reintroduced this bug on my NP900x4c. Reverting the commit makes things work again.

(In reply to Elvis Pranskevichus from comment #249)
> commit 4c237371f290d1ed3b2071dd43554362137b1cce
> Author: Lv Zheng <email address hidden>
> Date: Wed Jan 4 11:17:17 2017 +0800
>
> ACPI / EC: Remove old CLEAR_ON_RESUME quirk
>
> The above reintroduced this bug on my NP900x4c. Reverting the commit makes
> things work again.

Confirmed, the bug is back again on samsung latops :-(

https://bbs.archlinux.org/viewtopic.php?pid=1767061#p1767061

Not exactly surprising when the change that fixed then gets reverted out of the kernel. Seems like Intel trying to force obsolescence of these systems.

(In reply to Mark Syms from comment #251)
> Not exactly surprising when the change that fixed then gets reverted out of
> the kernel. Seems like Intel trying to force obsolescence of these systems.

what has to be done to bring the fix back into the kernel code?
I mean at the first glance it was only for `Samsung hardware`, so I guess it does not affect any other vendors/hardwares.
The mentioned commit looks like just a `code cleanup` changes. but I'm not familiar with the kernel code.

Shall I file a new issue or could this be opened again?

Lv is not working on this now, and I'm new to the EC code, first let's confirm a quick revert patch.

Created attachment 274121
revert patch

please confirm
1. the problem exists with the latest upstream kernel, say, 4.16-rc1
2. the problem is gone after applying the patch attached

I can confirm that the regression exists with 4.15.3 and that your patch fixes it. 4.16 not tested.

I am experiencing the same problem (Arch Linux, kernel 4.15.3, KDE). Hardware: Samsung laptop model 900X3L. Will the patch be included in the Linux kernel?

Created attachment 274447
revert-patch works

confirmed, pacth is working on top of `4.15`

I am still facing this problem with Arch Linux + kernel 4.16.0-2. Hardware is a Samsung laptop (model: NP900X3L-KW1BR). Suggestions are welcome.

I compiled kernels 4.16.0 and 4.16.2 (in Arch Linux) using the patch in Comment #254 and I confirm that the patch fixes the problem. (Hardware: Samsung model NP900X3L-KW1BR.)

Hi @Rui,
may I know if the patch #Comment 254 will be pushed upstream ?

Changed in linux:
importance: Unknown → High
status: Unknown → Confirmed
Displaying first 40 and last 40 comments. View all 283 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.