Comment 352 for bug 986724

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

(In reply to Kieran Clancy from comment #241)
> Created attachment 157601 [details]
> dmesg on linux-next without quirk showing EC working before but not after
> triggering bug
>
> (In reply to Lv Zheng from comment #240)
> > Hi, Kieran
> >
> > Can you:
> > 1. use current linux-pm/linux-next branch
> > 2. enable EC debugging
>
> Is it enough to just uncomment the #define DEBUG?

Yes.

> > 3. disable quirk
>
> Is it enough to prevent EC_FLAGS_CLEAR_ON_RESUME from being set? Is this
> what you meant?

Yes.

> > 4. trigger the bug
> > 5. capture the dmesg right after resume
> > 6. upload the dmesg here
>
> See attached. I annotated the dmesg with a few extra > /dev/kmsg lines
> starting with 'KC'. They are:
>
> [ 107.787915] KC just booted and logged in
> [ 137.240937] KC screen close/open works as intended
> [ 203.943414] KC about to suspend, but not trigger bug this time
> [ 226.584995] KC back from suspend
> [ 265.000194] KC screen close/open works as intended
> [ 298.608956] KC AC change detected as intended
> [ 318.874279] KC about to suspend and trigger bug
> [ 348.834217] KC back from suspend
> [ 374.777099] KC screen open/close not detected
> [ 398.000040] KC AC change not detected
> [ 425.733615] KC about to manually clear EC events with script
> [ 442.663539] KC EC events cleared
> [ 469.766095] KC AC change detected again
> [ 496.219180] KC screen open/close detected again

Great information!
Thanks.

> So between 0 and 318 you can see the EC operating as intended including one
> suspend cycle.
>
> After 318 I trigger the bug (by unplugging AC a bunch of times), and after
> that you can see that there is nothing at all logged when I either closed
> the screen or unplugged AC.
>
> At 425 I manually simulated the quirk by sending 0x84 EC command queries and
> reading the data until the data is 0x00 using Juan Manuel's userspace
> program.
>
> After that, EC events are detected properly again.

Great test cases!
Thanks.

> My analysis is below:
>
> After the bug is triggered, SCI_EVT=1 is set just ONE time, immediately
> after resume:
>
> [ 337.319012] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
>
> It does not seem as though we ever handle this event properly though.
> Namely, there seems to be no corresponding "EC_SC(W) = 0x84". There is a
> couple of "EC_DATA(W) = 0x84" but I'm pretty sure these are totally
> different?

This is a bug, in ec_poll(), there is no code to check SCI_EVT.
And event will thus be lost.

I think this has been fixed in the ec-flush6.patch.
+out:
+ if (status & ACPI_EC_FLAG_SCI &&
+ (!t || t->flags & ACPI_EC_COMMAND_COMPLETE))
+ __acpi_ec_set_event(ec);

We will have this flag checked in the advance_transaction.
So you won't see this with ec-flush[1-6].patch applied.

> Further testing shows that this SCI_EVT=1 happens for the first resume after
> the bug is triggered, but not for the second or subsequent resumes.
>
> That means that we are still going to need a wakeup quirk because if for
> some reason we fail to clear the EC state before the next suspend, we will
> never get another SCI_EVT=1 (even after a power cycle, I believe).

Yes, I think this is required.
What I'm going to do is to add a flag "EC_FLAGS_EVENT_DRAINED" in the driver.
It is cleard after boot/resume and when SCI_EVT=1 is detected.
When it is cleared, QR_EC will be continously issued until 0x00 is returned.
After seeing 0x00, it will be set again to indicate events have been drained.
I'll make this piece of code as Samsung specific quirk for now.

Thanks and best regards
-Lv