[Hardy] ACPI Embedded Controller (EC) stops boot when kernel boot 'quiet' option is enabled or AC power is connected

Bug #191137 reported by Vladimir Meremyanin
112
Affects Status Importance Assigned to Milestone
Linux
Invalid
High
linux (Ubuntu)
Fix Released
High
Stefan Bader
Hardy
Fix Released
High
Stefan Bader

Bug Description

Installed Gutsy, then changed repositories to Hardy's ones, and now have 3 kernels:

linux-image-2.6.22-14-generic - from gutsy, boots fine, X can't find nvidia drivers (since they are for new kernels), but it's ok.

Now the troubles:
linux-image-2.6.24-7-386 (2.6.24-7.12) - hangs, in different places, but with a lots of acpi-related errors
linux-image-2.6.24-7-generic (2.6.24-7.12) - completely hangs with 2 lines on screen:

ACPI: EC: acpi_ec_wait timeout, status=32, expect_event=1
ACPI: EC: read timeout, command=128

only holding the power button works after these messages.

Both kernels work fine in recovery mode! the only inconvenience is the need to select 'boot normally' :)

Revision history for this message
Vladimir Meremyanin (v-stiff) wrote :
Revision history for this message
Vladimir Meremyanin (v-stiff) wrote :

upgraded to 2.6.24-8-generic, and got same troubles.

Revision history for this message
Vladimir Meremyanin (v-stiff) wrote :

Removed battery and unplugged power cord for a few seconds, and 2.6.24-8-generic started without any problems!

I think it should handle somehow 'wrong' (or what it was) acpi states, since vista loaded fine without battery removal.

Revision history for this message
Vladimir Meremyanin (v-stiff) wrote :

Problem still works in 2.6.24-10. Is anyone reading this reports?

Revision history for this message
sandfly (sjpenn) wrote :

I have a similar booting issue with dell xps m1210.

7.1 worked fine however, 8.04 only boots in recovery mode, video drivers are jacked.

any help will be appreciated.

sandfly

Revision history for this message
Thomas McKay (tom-mckay1) wrote :

I can confirm this problem on my sony vaio VGN-C240E laptop.

Revision history for this message
Thomas McKay (tom-mckay1) wrote :

also, i can confirm that i was able to boot by removing the battery and unplugging for several seconds.
it would appear the issue is centred on the battery/battery bay.

Revision history for this message
Vladimir Meremyanin (v-stiff) wrote :

no improvements in 2.6.24-11 and 2.6.24-12. Things even got worse, now I have no sound.

Revision history for this message
Alexey Starikovskiy (astarikovskiy) wrote :

Please check if this _debug_ patch improves situation...

Revision history for this message
marco-peroverde (marco-peroverde) wrote :

I can confirm that problem for my fe41z.
After upgrading from 7.10 to Alpha 5 my system wasn't bootable anymore. Rec. mode worked.
Live CD from Alpha 6 couldn't boot. And the Beta LiveCD has now exactly the problem which is described above.

I can't check the debug patch, because I need that PC for work ;)

Revision history for this message
bitzer (bitzer) wrote :

Same problem after installing Hardy Heron Beta on my Vaio VGN-C1S/H. No problems with Gutsy.

Revision history for this message
marco-peroverde (marco-peroverde) wrote :

I just did some tests with the live cd.

If I try to boot with that beta cd and AC is plugged in and the battery is in the bay I can not boot.
I get the errormessage described in the first post.
If I try to boot with that beta cd and AC is plugged out and the notebook is running on battery booting is possible.
Once the progessbar is displayed I can replug AC and Ubuntu is still booting.

Revision history for this message
Vladimir Meremyanin (v-stiff) wrote :

I've finally compiled kernel with patch, and it works!

Thanks Alexey!

Revision history for this message
Thomas McKay (tom-mckay1) wrote :

I am sorry, but I cannot test the patch either.
If somebody has a laptop with this problem they do NOT depend on, please test it out so we can get this patch packaged.
I look forward to my next kernel update...

Revision history for this message
Vladimir Meremyanin (v-stiff) wrote :

What do you mean 'NOT depend on'? you never reboot it?

All you need to compile kernel is waste couple hours of CPU time, and some of yours: https://help.ubuntu.com/community/Kernel/Compile

once it's installed you need one reboot to test it and if it fails boot your previous kernel.

Revision history for this message
marco-peroverde (marco-peroverde) wrote :

'Not depend on' means not daring to install 8.04 on a productive system...

Revision history for this message
Thomas McKay (tom-mckay1) wrote :

Okay Vlad, you twisted my arm. I compiled the kernel with the patch and have installed it... I'm about to reboot. I'll report back here with my results, you can be assured of that.

cheers.

Revision history for this message
Thomas McKay (tom-mckay1) wrote :

I can confirm without a doubt that this patch has solved this bug.

However, I forgot to include my wireless drivers when compiling my kernel :(

Revision history for this message
Vladimir Meremyanin (v-stiff) wrote :

It's not so painful as it seems, is it?

> However, I forgot to include my wireless drivers when compiling my kernel :(

This is strange, because kernel should be exactly in the same configuration as in binary package (unless you changed config in some way, I didn't touch anything, just applied patch, and wireless works)

Revision history for this message
Thomas McKay (tom-mckay1) wrote :

well i do have wireless... just 802.1x, and i need WPA2 enterprise support.

anyways, I have compiled kernels before, I just didn't expect to have the free time to do this, or the free time to fix something if i borked it.

Thanks a lot for the patch though, but I'm going to continue booting in recovery mode until this issue is solved at the repository level. It may be a hassle, especially for livecd users, however for me it is a small inconvenience.

Revision history for this message
Luca Cavalli (luca-cavalli) wrote :

Tested the proposed patch on my brother's laptop and now he can boot without recovery mode, so another positive feedback.

Revision history for this message
Vladimir Meremyanin (v-stiff) wrote :

2.6.24-15.17 Consistently hangs with power cable attached, and boots when battery only (without battery removal).

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Everyone,

Just a few things to note here. First the upstream git commit id and description which seems to have resulted in the issue you are seeing is as follows:

commit c04209a7948b95e8c52084e8595e74e9428653d3
Author: Alexey Starikovskiy <email address hidden>
Date: Tue Jan 1 14:12:55 2008 -0500

    ACPI: EC: Enable boot EC before bus_scan

    Some _STA methods called during bus_scan() might require EC region handler,
    which might be enabled later in the scan.
    Enable it explicitly before scan to avoid errors.

    Reference: http://bugzilla.kernel.org/show_bug.cgi?id=9627

    Signed-off-by: Alexey Starikovskiy <email address hidden>
    Signed-off-by: Len Brown <email address hidden>

So it seems fixes for another upstream bug resulted in what you are seeing. I also noticed the following in the dmesg output from Vladimir:

[ 0.000000] ACPI: BIOS bug: multiple APIC/MADT found, using 0
[ 0.000000] ACPI: If "acpi_apic_instance=2" works better, notify <email address hidden>

Just curious if you tried acpi_apic_instance=2 without the patch applied and if it helps the situation.

Unfortunately the kernel is currently frozen for Hardy as we're approximately 2 weeks away from the final release. Obviously reverting this patch will cause issues for others :( I'll ping the Ubuntu kernel team to try and see what can be done but it may be the case that this won't get resolved until the Intrepid Ibex 8.10 kernel opens for development.

It would also be helpful if someone who has this issue could first test the current upstream kernel to verify the issue still exists upstream. See https://wiki.ubuntu.com/KernelTeam/GitKernelBuild for help with building the upstream kernel. If the issue still exists, if you could post a comment to the upstream bugzilla report (http://bugzilla.kernel.org/show_bug.cgi?id=9627) noting the regression the patch has caused and you've verified that removing the code (ie ifdef'ed it out) resolves the issue.

I apologize that this was overlooked until now. Apparently this report wasn't assigned to the right package (ie the Ubuntu kernel source package 'linux') and was being overlooked. It was recently reassigned to the appropriate 'linux' package. Just for future reference, https://wiki.ubuntu.com/Bugs/FindRightPackage can help with this. Thanks.

Revision history for this message
TJ (tj) wrote :

I've just spotted this bug report on the kernel-team mailing list thanks to Leann. I've been running a Vaio vGN-FE41Z since April 2007. I started with 32-bit builds but moved to x86_64 and have not tried 32-bit again since.

I've been testing/using Hardy alongside Gutsy all the way through the development process and not experienced this issue. The laptop is usually on mains power when it boots.

Interestingly, I did rewrite the ACPI DSDT to fix an issue with the battery-technology reported as non-rechargeable but that was ages ago and I can't see it being affected by this:

http://ubuntuforums.org/showthread.php?t=475801

I'll do some systematic tests tomorrow with 64-bit and 32-bit Hardy builds.

Revision history for this message
TJ (tj) wrote : Re: i386 Hardy boots only in recovery mode on VAIO FE41Z

This only affects the x86 i386 build. I'm investigating further.

Revision history for this message
marco-peroverde (marco-peroverde) wrote :

Just to have it said. I tried a few days ago to get rid of the problem by upgrading the notebooks bios to the latest version.
Didn't work ;)

Revision history for this message
TJ (tj) wrote :

What we expect to see is:

[ 28.070535] ACPI: EC: Look up EC in DSDT
[ 28.077294] ACPI: Interpreter enabled
[ 28.077357] ACPI: (supports S0 S3 S4 S5)
[ 28.077612] ACPI: Using IOAPIC for interrupt routing
[ 28.078065] ACPI: EC: non-query interrupt received, switching to interrupt mode
[ 28.105273] ACPI: EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
[ 28.105338] ACPI: EC: driver started in interrupt mode

But i386, when on external power, reports:

[ 18.127035] ACPI: EC: Look up EC in DSDT
[ 18.136591] ACPI: Interpreter enabled
[ 18.136594] ACPI: (supports S0 S3 S4 S5)
[ 18.136604] ACPI: Using IOAPIC for interrupt routing
[ 18.136844] ACPI: EC: non-query interrupt received, switching to interrupt mode
[ 18.634304] ACPI: EC: acpi_ec_wait timeout, status = 0, expect_event = 1
[ 18.634362] ACPI: EC: read timeout, command = 128
[ 18.634415] ACPI Exception (evregion-0420): AE_TIME, Returned by Handler for [EmbeddedControl] [20070126]
[ 18.634419] ACPI Exception (dswexec-0462): AE_TIME, While resolving operands for [OpcodeName unavailable] [20070126]
[ 18.634423] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.LPCB.EC__._REG] (Node f7c4bd20), AE_TIME
[ 19.142018] ACPI: EC: missing confirmations, switch off interrupt mode.
[ 19.154286] ACPI: EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
[ 19.154288] ACPI: EC: driver started in poll mode

The key part is:

[ 18.634419] ACPI Exception (dswexec-0462): AE_TIME, While resolving operands for [OpcodeName unavailable] [20070126]

The ACPI EC._REG method is:

Method (_REG, 2, NotSerialized)
{
    If (LEqual (Arg0, 0x03))
    {
        Store (Arg1, ECON)
        Store (BATP, BNUM)
        Store (RSCL, B0SC)
        Store (RPWR, PWRS)
        Notify (BAT0, 0x81)
        PNOT ()
        If (LEqual (PRCP, One))
        {
            Notify (DOCK, Zero)
        }

        If (LEqual (WKSR, 0x02))
        {
            Notify (DOCK, One)
        }
    }
}

If the early-enabling of the EC is the reason, it looks to me as if one or more of the STORE() operations is failing because the target hasn't been declared in the ACPI name-space yet.

The call-chain looks to be:

drivers/acpi/ec.c::acpi_ec_transaction_unlocked()
 drivers/acpi/ec.c::acpi_ec_wait()

-- drivers/acpi/ec.c::acpi_ec_transaction_unlocked() --

for (; rdata_len > 0; --rdata_len) {
 result = acpi_ec_wait(ec, ACPI_EC_EVENT_OBF_1, force_poll);
 if (result) {
  pr_err(PREFIX "read timeout, command = %d\n", command);
  goto end;

--

-- acpi_ec_wait() --

if (likely(test_bit(EC_FLAGS_GPE_MODE, &ec->flags)) && ...

...
 if (acpi_ec_check_status(ec, event)) {
...
 } else {
  /* missing GPEs, switch back to poll mode */
  if (printk_ratelimit())
   pr_info(PREFIX "missing confirmations, "
    "switch off interrupt mode.\n");
  clear_bit(EC_FLAGS_GPE_MODE, &ec->flags);
 }

I'll do an ACPI debug test on this and see where it is going wrong.

Revision history for this message
TJ (tj) wrote :

On external power, I've just installed the 2008-04-10 Hardy i386 LiveCD.

The LiveCD hangs during boot with the messages:

ACPI: EC: acpi_ec_wait timeout, status=32, expect_event=1
ACPI: EC: read timeout, command=128

However, the *installed* system boots fine:

[ 13.169576] ACPI: EC: Look up EC in DSDT
[ 13.178074] ACPI: Interpreter enabled
[ 13.178137] ACPI: (supports S0 S3 S4 S5)
[ 13.178392] ACPI: Using IOAPIC for interrupt routing
[ 13.180182] ACPI: EC: non-query interrupt received, switching to interrupt mode
[ 13.205184] ACPI: EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
[ 13.205249] ACPI: EC: driver started in interrupt mode

~$ uname -a
Linux hardy-i386 2.6.24-15-generic #1 SMP Tue Apr 8 00:33:51 UTC 2008 i686 GNU/Linux

Can I get some clarifications on the various reports?

1. Is anyone having problems with the *installed* i386?
2. What dates are the daily LiveCDs that *do* hang?
3. Have you tried the amd64 (64-bit) LiveCD?

This looks to me rather like something missing from the initrd of the CD.

Currently the workaround appears to be to *disconnect* external power whilst the i386 LiveCD is booting and reconnect it once the loading splash screen appears.

Revision history for this message
TJ (tj) wrote :

For some reason I can no longer reproduce the issue with the 2008-04-10 LiveCD, and I can't figure out why. I've tried removing the battery and doing a cold start, with and without external power, restarting from Gutsy session.

Report back if you're still seeing this issue with the most recent LiveCD (now 2008-04-11)

http://cdimage.ubuntu.com/daily-live/current/

Revision history for this message
slash2314 (slash2314) wrote :

I had the same problem with a sony vaio VGN-N320E. I am able to boot when I remove the quiet kernel option in grub. I am just using the ro and splash options.

Revision history for this message
TJ (tj) wrote : Re: [Hardy] ACPI Embedded Controller (EC) stops boot when kernel boot 'quiet' option is enabled

slash2314, thank-you!

In all my testing I usually remove "quiet splash" to watch the boot messages which explains why I was unable to reproduce it. I also found this does indeed affect x86_64 as well. Because I run development kernels I never have the "quiet" option enabled - hence my not being affected.

I've rewritten the bug title since it appears to affect more machines.

So, the revised work-around is to remove the "quiet" option from the kernel command line. If you're using an installed system, do this in /boot/grub/menu.lst and maybe change the default koptions.

For the LiveCD, press "F6" when the menu appears, press "End" to get to the end of the command line, and then use back-space to delete the "quiet" option. Press Enter to continue to boot.

TJ (tj)
Changed in linux:
assignee: ubuntu-kernel-team → ubuntu-kernel-acpi
status: Triaged → In Progress
Revision history for this message
marco-peroverde (marco-peroverde) wrote :

Yebb booting unquiet works with the LiveCD. Even with the Beta release.

Revision history for this message
TJ (tj) wrote :
Download full text (4.4 KiB)

Right, I know what is causing this now. We have a complex interaction of three things:

1. Early EC initialisation introduced by commit c04209a7948b95e8c52084e8595e74e9428653d3
2. Timing (especially on multiprocessor / multitasking systems)
3. Possible 'problem' with ACPI DSDT embedded controller logic

When the EC is initialised before the rest of the ACPI namespace, any references in the EC _REG() method to other node names will fail since they're not yet known. In the case of the PCs being affected by this bug, there is a Notify(BAT0, 0x81) or similar. BAT0 is not yet known but the notify causes a general purpose event (GPE) to be fired which goes unhandled and times out.

When boot option "quiet" is enabled there are very few kernel messages being logged so everything executes faster. When "quiet" is removed, or debugging of any kind is enabled, the additional time taken to log messages gives enough breathing space for the ACPI namespace to be scanned and the other objects created before the timeout occurs.

The _REG(RegionSpace, 1) control method is used to notify AML that an operation region is available. It is not clear from the ACPI specification whether _REG() is allowed to Notify() other nodes. If the namespace is fully scanned it shouldn't be an issue, but when the EC is started early via an ECDT or by virtue of the commit here (intended to provide the ECDT equivalent for BIOSes without it), then it can cause a failure.

I edited the EC's _REG() method and disabled the Notify(BAT0...) call and then installed the revised DSDT in an initrd image. I kept the original initrd with a different name and added an additional boot entry into the GRUB menu. I then tested both initrd images. The one containing the modified DSDT with Notify() disabled starts correctly. The original DSDT causes it to fail.

$ sudo acpidump -b -t DSDT -o DSDT.aml
$ iasl -d DSDT.aml
$ gedit DSDT.dsl

Device (EC)
...

 Method (_REG, 2, NotSerialized)
 {
  If (LEqual (Arg0, 0x03))
  {
   Store (Arg1, ECON)
   Store (BATP, BNUM)
   Store (RSCL, B0SC)
   Store (RPWR, PWRS)
   /* Notify (BAT0, 0x81) not allowed for early-init ECs */
   PNOT ()
   If (LEqual (PRCP, One))
   {
    Notify (DOCK, Zero)
   }

   If (LEqual (WKSR, 0x02))
   {
    Notify (DOCK, One)
   }
  }
 }

I rebuilt the DSDT:

$ iasl DSDT.dsl
$ sudo cp DSDT.aml /etc/initramfs-tools/
$ sudo mv /boot/initrd.img-2.6.24-16-generic /boot/initrd.img-2.6.24-16-generic-ec-bug
$ sudo update-initramfs -u ALL
$ sudo mv /boot/initrd.img-2.6.24-16-generic /boot/initrd.img-2.6.24-16-generic-ec-fix

Now edit /boot/grub/menu.lst, edit the initrd name and add another menu option:

title Ubuntu Hardy 64-bit, kernel 2.6.24-16-generic EC fix
root (hd0,4)
kernel /vmlinuz-2.6.24-16-generic root=UUID=bb2c3a14-1588-4fb9-8411-71f114b568b4 ro quiet
initrd /initrd.img-2.6.24-16-generic-ec-fix
quiet

title Ubuntu Hardy 64-bit, kernel 2.6.24-16-generic EC bug
root (hd0,4)
kernel /vmlinuz-2.6.24-16-generic root=UUID=bb2c3a14-1588-4fb9-8411-71f114b568b4 ro quiet
initrd /initrd.img-2.6.24-16-generic-ec-bug
quiet

------

I have a large collection of DSDTs from across the Vaio range as part of my SNC research. I ran an ana...

Read more...

Revision history for this message
Thomas McKay (tom-mckay1) wrote : Re: [Bug 191137] Re: [Hardy] ACPI Embedded Controller (EC) stops boot when kernel boot 'quiet' option is enabled
  • unnamed Edit (1.2 KiB, text/html; charset=ISO-8859-1)

Thank you so much for the suggestion!
From now on i'll boot loudly and proudly!

On Sat, Apr 12, 2008 at 12:33 AM, TJ <email address hidden> wrote:

> ** Bug watch added: Linux Kernel Bug Tracker #10444
> http://bugzilla.kernel.org/show_bug.cgi?id=10444
>
> ** Also affects: linux via
> http://bugzilla.kernel.org/show_bug.cgi?id=10444
> Importance: Unknown
> Status: Unknown
>
> --
> [Hardy] ACPI Embedded Controller (EC) stops boot when kernel boot 'quiet'
> option is enabled
> https://bugs.launchpad.net/bugs/191137
> You received this bug notification because you are a direct subscriber
> of the bug.
>

--
Tom McKay

Changed in linux:
status: Unknown → Confirmed
Revision history for this message
slash2314 (slash2314) wrote : Re: [Hardy] ACPI Embedded Controller (EC) stops boot when kernel boot 'quiet' option is enabled

In the kernel bug report TJ submitted to bugzilla it stated that the latest working kernel was 2.6.22, but for me the bug is not present in 2.6.23.

Revision history for this message
Václav Šmilauer (eudoxos) wrote :

On vaio VGN-N21, editing the DSDT alone doesn't fix the freeze; unquiet boot works just fine.

Revision history for this message
TJ (tj) wrote :

Václav, your result is interesting, thank you. I didn't have time to endlessly repeat the reboot tests when I did the DSDT patch so it is possible that it was in some way coincidental that the kernel didn't crash those times.

Since doing that patch and having time to think about things more I'm leaning towards the idea that the power management hardware is generating the unhandled event when it sees the AC adapter is connected - not the EC._REG() method because of early init. If this 'new' theory is correct, then when the EC GPE events are enabled early the hardware can start interrupting. The problem is, as soon as I add even minimal debug messages to try and trace the code-path the bug goes away - the same as when removing the "quiet" kernel boot option!

Revision history for this message
Vladimir Meremyanin (v-stiff) wrote : Re: [Bug 191137] Re: [Hardy] ACPI Embedded Controller (EC) stops boot when kernel boot 'quiet' option is enabled
  • unnamed Edit (1.5 KiB, text/html; charset=ISO-8859-1)

I've encountered lock when boot on battery only.

On Sun, Apr 13, 2008 at 8:56 PM, TJ <email address hidden> wrote:

> Václav, your result is interesting, thank you. I didn't have time to
> endlessly repeat the reboot tests when I did the DSDT patch so it is
> possible that it was in some way coincidental that the kernel didn't
> crash those times.
>
> Since doing that patch and having time to think about things more I'm
> leaning towards the idea that the power management hardware is
> generating the unhandled event when it sees the AC adapter is connected
> - not the EC._REG() method because of early init. If this 'new' theory
> is correct, then when the EC GPE events are enabled early the hardware
> can start interrupting. The problem is, as soon as I add even minimal
> debug messages to try and trace the code-path the bug goes away - the
> same as when removing the "quiet" kernel boot option!
>
> --
> [Hardy] ACPI Embedded Controller (EC) stops boot when kernel boot 'quiet'
> option is enabled
> https://bugs.launchpad.net/bugs/191137
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Changed in linux:
status: Confirmed → Incomplete
Revision history for this message
TJ (tj) wrote : Re: [Hardy] ACPI Embedded Controller (EC) stops boot when kernel boot 'quiet' option is enabled

With "quiet" still enabled, try adding this boot option:

ec_intr=1

It should force the GPE to use interrupt mode.

Also, a long shot, but try:

acpi_os_name="Windows 2006"

since the Vaios often have Vista-specific functionality in the ACPI BIOS.

Revision history for this message
bitzer (bitzer) wrote :

Removed the second quiet option in /boot/grub/menu.lst and now boot normal on VAIO VGN-C1S/H. I've downloaded current iso image yesterday (april 14th).

Revision history for this message
Vladimir Meremyanin (v-stiff) wrote : Re: [Bug 191137] Re: [Hardy] ACPI Embedded Controller (EC) stops boot when kernel boot 'quiet' option is enabled
  • unnamed Edit (1.0 KiB, text/html; charset=ISO-8859-1)

first (and yet only) attempt to set ec_intr=1 succeeded!

linux-image 2.6.24.16.18

On Tue, Apr 15, 2008 at 8:43 PM, TJ <email address hidden> wrote:

> With "quiet" still enabled, try adding this boot option:
>
> ec_intr=1
>
> It should force the GPE to use interrupt mode.
>
> Also, a long shot, but try:
>
> acpi_os_name="Windows 2006"
>
> since the Vaios often have Vista-specific functionality in the ACPI
> BIOS.
>
> --
> [Hardy] ACPI Embedded Controller (EC) stops boot when kernel boot 'quiet'
> option is enabled
> https://bugs.launchpad.net/bugs/191137
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
TJ (tj) wrote : Re: [Hardy] ACPI Embedded Controller (EC) stops boot when kernel boot 'quiet' option is enabled

Vladimir, that is great news!

I'm building test kernels for several ACPI related bugs at present so I can't test it immediately, but if that proves to deal with this issue as a workaround until we can locate the root-cause, that is better in many ways than removing "quiet".

I wanted to record here my preliminary investigation into what "quiet" does that is different from a regular boot without it.

Firstly, the existing levels:

$ cat /proc/sys/kernel/printk
4 4 1 7

That's console_loglevel, default_message_level, minimum_console_level, default_console_loglevel. You'll notice above that the default level (the last value) is 7 which is used when "quiet" *is not* passed to the kernel.

In init/main.c::quiet_kernel()

 console_loglevel = 4;

In include/linux/kernel.h:

#define console_loglevel (console_printk[0])

and in kernel/printk.c:

#define MINIMUM_CONSOLE_LOGLEVEL 1 /* Minimum loglevel we let people use */
#define DEFAULT_CONSOLE_LOGLEVEL 7 /* anything MORE serious than KERN_DEBUG */

int console_printk[4] = {
 DEFAULT_CONSOLE_LOGLEVEL, /* console_loglevel */
 DEFAULT_MESSAGE_LOGLEVEL, /* default_message_loglevel */
 MINIMUM_CONSOLE_LOGLEVEL, /* minimum_console_loglevel */
 DEFAULT_CONSOLE_LOGLEVEL, /* default_console_loglevel */
};

So when running at level 4 less messages get printed to console and therefore 'things' will happen faster than at the default level 7 when more messages are being generated. That suggests a timing issue as I said previously *unless* somehow some code is caught in a conditional expression based on loglevel.

Revision history for this message
Vladimir Meremyanin (v-stiff) wrote :

> Vladimir, that is great news!

Yeah I thought so too, until recent boot: 2 times hanged with ec_intr=1. Boot only after detaching power cord.

Revision history for this message
TJ (tj) wrote :

Vladimir - that doesn't surprise me! I got the "ec_intr" setting from Greg Kroah-Hartman's Linux Kernel in a Nutshell, kernel parameters. However, when I searched the source to find out what it does I find it doesn't exist!

The good news is 2.6.25-rc9 isn't affected so hopefully we can identify the commits that have changed the behaviour. Looks like we're back to the "quiet" option until then (or leaving AC unplugged/switched off).

Revision history for this message
tomd123 (tdziedz2) wrote :

I can confirm this bug in 8.04 release candidate on my VAIO vgn-fe880e laptop! I hope this gets fixed, my installation will only work after a couple of retries. I just keep rebooting till it works, and that works, but it occasionally pops up. Please don't let this bug get through to the 8.04 LTS release :(

Revision history for this message
Trevor Nightingale (trevornightingale) wrote :

I can confirm this bug in the 8.04 release on my Sony Vaio VGN-N230E.

Revision history for this message
fralonso (franky4dedos) wrote :

I can confirm this bug in the 8.04 release on my Sony Vaio VGN-38L/W

Revision history for this message
Jonas Steinmann (steinmann-jonas) wrote :

I can confirm this bug in the 8.04 release on my Sony Vaio VGN-N21Z.

Revision history for this message
dxmaster (dxmaster) wrote :

I can confirm this bug in the 8.04 release on my Sony Vaio VGN-N395E.

Revision history for this message
andyh303 (andyh303) wrote :

I can confirm this bug in the 8.04 release on my Sony Vaio VGN C2S/H

Revision history for this message
Ray (bernhard-posselt) wrote :

I can confirm this bug in the 8.04 release on my Sony Vaio VGN-FE41Z

Revision history for this message
Aaronc (hiaaronle) wrote :

I can confirm this bug in the 8.04 release on my Sony Vaio VGN-C21CH/B

Revision history for this message
Chris D (errortype3) wrote :

I can confirm this bug in the 8.04 LTS release on my Sony Vaio VGN-N250E. But removing the 'quiet splash' from the kernel line in /boot/grob/menu.lst seems to have solved the problem for now. thanks a bunch!

Revision history for this message
Thomas McKay (tom-mckay1) wrote : Re: [Bug 191137] Re: [Hardy] ACPI Embedded Controller (EC) stops boot when kernel boot 'quiet' option is enabled or AC power is connected
  • unnamed Edit (1.0 KiB, text/html; charset=ISO-8859-1)

bug persists in hardy release on my sony vaio vgn-c240e

On Sat, Apr 26, 2008 at 11:10 AM, Chris D <email address hidden> wrote:

> I can confirm this bug in the 8.04 LTS release on my Sony Vaio VGN-
> N250E. But removing the 'quiet splash' from the kernel line in
> /boot/grob/menu.lst seems to have solved the problem for now. thanks a
> bunch!
>
> --
> [Hardy] ACPI Embedded Controller (EC) stops boot when kernel boot 'quiet'
> option is enabled or AC power is connected
> https://bugs.launchpad.net/bugs/191137
> You received this bug notification because you are a direct subscriber
> of the bug.
>

--
Tom McKay

Revision history for this message
Matt K (matt1632) wrote :

Hmm, it seems this bug affects mostly Sony Vaio laptops. I have this problem with the live cd on my sony vaio VGN-FE870E. I was planning to test out Hardy using the Wubi installer. How will I set quiet boot without using GRUB? Will this bug be fixed and implemented into hardy or will we have to wait until 8.10?

Revision history for this message
Thomas McKay (tom-mckay1) wrote :
  • unnamed Edit (1.5 KiB, text/html; charset=ISO-8859-1)

Matt,

you can configure the boot options by pressing escape at the grub count
down, pressing "e" to enter the edit mode, then edit what you wish. Pressing
"b" when you are done will boot the kernel with your modified options. That
should at least get you to the installer.

i actually don't know the first thing about WUBI, or how to solve this
problem to work with WUBI, but good luck to you sir.

On Sat, Apr 26, 2008 at 6:11 PM, Matt K <email address hidden> wrote:

> Hmm, it seems this bug affects mostly Sony Vaio laptops. I have this
> problem with the live cd on my sony vaio VGN-FE870E. I was planning to
> test out Hardy using the Wubi installer. How will I set quiet boot
> without using GRUB? Will this bug be fixed and implemented into hardy
> or will we have to wait until 8.10?
>
> --
> [Hardy] ACPI Embedded Controller (EC) stops boot when kernel boot 'quiet'
> option is enabled or AC power is connected
> https://bugs.launchpad.net/bugs/191137
> You received this bug notification because you are a direct subscriber
> of the bug.
>

--
Tom McKay

Revision history for this message
Dave Rice (ricey) wrote :

I'm wondering if this is the same thing that affects the ASUS EeePC 701 4G

I don't see the errors referring to ACPI at boot time, but a stock boot from LiveCD 8.04 release with an SDHC card in the slot fails at detecting the USB subsystem.

Removing the 'quiet' boot fails
Removing 'quiet and splash' boot fails
Removing the SD card allows booting
Adding 'ec_intr=1' allows booting (only tried once though)

on an installed system:

Removing Card allows booting
Removing 'quiet' allows booting
default settings boot takes over 3 minutes to complete the boot. It pauses after CPU detection.

I'm not sure it's quite the same but there is an air of similarity to it.

I can attach ACPI details if you think it would be useful or relevant

cheers :)

Revision history for this message
Maurizio (fiz-ban) wrote :

Hello all,

I have the same bug in 8.04 LTS on my IBM R51e.
Using kernel 2.6.24.16 or 2.6.24.17 system starts only with the power cabled unplugged.
With old kernel 2.6.22-14 systems starts without problem.

Regards

Revision history for this message
evkefalas (vkefalas) wrote :

same problem for me
sony vaio, worked fine on 7.10
now i have this stupid prolem on 8.04

Revision history for this message
Rodia (zirrara) wrote :

I can confirm this bug in the 8.04 release on my Sony Vaio VGN-N31L

Revision history for this message
Cristian T (cristroncos) wrote :

Hi everyone,

   I can confirm the exact dame bug in my sony vaio VGN-C240E. This bug happens when I tried to boot using the 2.6.24-16 or 2.6.24-17 kernels. However, everything works fine with the 2.6.22-14 kernel. Moreover, when I am successful in booting wiht the 2.6.24-17 I lose my audio and my wireless.

Revision history for this message
blightzero (blightzero) wrote :

I have the same bug on my Sony Vaio VGN-C2Z/B although I can boot in recovery mode or have the power unplugged, there still seems to be some other acpi problem, the cpu is running a whole lot warmer than with the 2.6.22 kernel from 7.10, and recharging does not seem to work. It never fully recharges. This seems to be more than just a boot time problem.

Revision history for this message
Brinley Ang (brinley) wrote :

I can confirm this bug in the 8.04 64-bit release on my Sony Vaio VGN-FE48G

Revision history for this message
Andrew Cox (acox-uow) wrote :

I can confirm this bug on the 8.04 release with kernel 2.6.24-16, on my Sony Vaio VGN-C25G. Removing "quiet splash" from the booot command, and removing the power cord has allowed it to boot successfully (?).

My CPU and GPU seems to be running hotter than usual though.

Revision history for this message
bekirserifoglu@gmail.com (bekirserifoglu) wrote :

i can confırm this bug on sony vaio fz 190. however it doesnt happen all the time. when it gives this acpi thing message, i cant see any info about my battery or cpu s heat etc.

I compiled the kernel 2.6.25.4 myself and the problem seems to be gone.

Revision history for this message
Vladimir Meremyanin (v-stiff) wrote :
  • unnamed Edit (1.5 KiB, text/html; charset=ISO-8859-9)

Bekir, it's Awesome!

Can you tell for lazy us (me at least) how to do that? :)

Do you have NVidia driver, wifi working?

I recently accidentally upgraded to 2.6.24-17 from repository, now I can't
compile it - running
apt-get source linux-image-generic

downloads 2.6.24-16.30 :(

On Sun, May 25, 2008 at 9:50 PM, Bekir Serifoglu <
<email address hidden>> wrote:

>
> i can confırm this bug on sony vaio fz 190. however it doesnt happen all the time. when it gives this acpi thing message, i cant see any info about my battery or cpu s heat etc.
>
> I compiled the kernel 2.6.25.4 myself and the problem seems to be gone.
>
> --
> [Hardy] ACPI Embedded Controller (EC) stops boot when kernel boot 'quiet'
> option is enabled or AC power is connected
> https://bugs.launchpad.net/bugs/191137
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
lizardmenke (lizardmenke) wrote :

I have this problem with sony vgn-n11s kernel 2.6.24-17-generic and dualboot with Win-XP.
When the problem occurs I push power button to stop the laptop.
Then I boot into Win-XP and let win do a disk check-and repair on C: drive.
Win does the check next time it boots up.
After Win did the C: drive check and the obligatory reboot, I shut down the laptop.
Then I start it up again and everything is fine and the system boots automaticaly into Ubuntu.
Until after +/- 30 days or so and as many or more boots (only Ubuntu) It happens again and I must do the same 'trick' once more.

Revision history for this message
bekirserifoglu@gmail.com (bekirserifoglu) wrote :
Download full text (3.5 KiB)

here is a humble how to for you, Vladimir. but i didnt compile ubuntu kernel i compiled the one from kernel.org. it seems to work better now.

there are a lot of how to's for compiling a kernel but i usually follow this one from ubuntu forums:
http://ubuntuforums.org/showthread.php?t=311158
it is almost foolproof. if you follow the steps in the thread you will get the old kernel's configuration into the new one. this is what we want. also you should choose your processor type in order for your kernel to work better. before compiling the kernel the terminal will ask you some question, be careful about them. if it asks whether it should include a feature or not, answer according to your needs. for example there were new feature about intel 4965 wireless and i have an intel 4965 wireless . so i chosed it as a module etc.
after the compiling the kernel you should have your firmwares otherwise your wireless card etc will not work. for this i simply copy old firmware folder with the new name. so i just moved the contents of /lib/firmware/2.6.24-17-generic to /lib/firmware/2.6.25.4 . create the folder if you dont have one. folder name may vary according to the kernel version you compiled. the one in the thread is 2.6.25.2 but i compiled 2.6.25.4 since it is the latest. you can do that by replacing "2.6.25.2"s with "2.6.25.4"s while you are following the steps.
after i installed the new kernel i checked the sound. it was working but kinda buggy. so i decided to install latest alsa driver but there was a problem. latest alsa-kernel has a problem. more info and a patch could be found here:
http://hg.alsa-project.org/alsa-kernel/rev/2d6164f0bf0e
i didn't use the patch, i fixed the problem manually.
So to fix the problem i have changed the lines in alsa-kernel in alsa driver. here is the modified alsa driver:
http://rapidshare.com/files/117590842/alsa_modified.tar.bz2
if you dont know how to install alsa, here is a perfect how to for installing alsa drivers:
https://help.ubuntu.com/community/HdaIntelSoundHowto
but it is old. you should download and install the latest alsa which is 1.0.16. follow this guide and install alsa. use my modified alsa-driver instead of downloading it from alsa-project. download alsa-utils and alsa-lib from their site though.
after installing alsa modules, i followed this guide for pulse audio:
http://ubuntuforums.org/showthread.php?p=4928900
now the sound works like charm.
but we have another problem nvidia drivers.you cant install nvidia drivers through synaptics or ubuntu repos anymore. i have downloaded the latest beta nvidia driver for their sites and installed it manually. installation was smooth but the driver was not working properly since it beta and the xorg is beta in hardy. so i decided to install 169.09. i have downloaded it from nvidia's site but it was not working with the latest kernel either. fortunately there is a solution:patching.
http://www.nvnews.net/vbulletin/showthread.php?t=110088
more information could be found in the thread. i pressed ctrl+alt+f1 then i gave the command "killall gdm" then followed the steps in the nvidia's site and the forum. first you must patch nvidia's driver following the steps...

Read more...

Revision history for this message
bekirserifoglu@gmail.com (bekirserifoglu) wrote :

hey i checked the latest kernels source but it doesnt seem to be patched with "disable early enable of boot_ec" patch. the code is still the same. but the problem is gone. i dont know. may u should include this patch while compiling your new kernel.
the problem is gone for me anyway.

Changed in linux:
status: Incomplete → Invalid
Revision history for this message
Thomas McKay (tom-mckay1) wrote :
  • unnamed Edit (912 bytes, text/html; charset=ISO-8859-1)

This bug is persisting, what measures are being taken to fix this?

On Thu, May 29, 2008 at 11:03 PM, Bug Watch Updater <
<email address hidden>> wrote:

> ** Changed in: linux
> Status: Incomplete => Invalid
>
> --
> [Hardy] ACPI Embedded Controller (EC) stops boot when kernel boot 'quiet'
> option is enabled or AC power is connected
> https://bugs.launchpad.net/bugs/191137
> You received this bug notification because you are a direct subscriber
> of the bug.
>

--
Tom McKay

Revision history for this message
wvengen (wvengen) wrote :

For one thing, upstream has marked the bug as invalid since they got no response. So if anyone who experiences the problem and can compile his own kernel from upstream could test, please answer upstream (or here if you don't want to create a bugzilla account there).
  http://bugzilla.kernel.org/show_bug.cgi?id=10444#c9
This still leaves open Thomas' question.

Revision history for this message
Nicolas Vandeput (nvdp-deactivatedaccount) wrote :

I have the same bug on my Sony Vaio VGN-FE48E with Hardy Heron, kernel 2.6.24-17-generic. It happens sometimes, I just have to turn my computer off the hard way and turn in back on. I' m not able to compile a kernel so I can't help you to test the patch...

Revision history for this message
Vincent Picavet (vincent-picavet) wrote :

I can confirm this bug in the 8.04 release on my Sony Vaio VGN-FE41E.
Disabling "quiet splash" seems to let the kernel boot.

Revision history for this message
Hao Zhe XU (haozhe.xu3) wrote :

After removing quiet option on my sony vaio C25G, I still got this error:

[17.896085] ACPI Error(something-0537): Method parse /execution
failed I\_SB_.PCIO.LPCB.EC__._REG](Node f7c4af1, AE_TIME
[17.896783] pnp: PnP ACPI init
... something here ...
[18.450851] ACPI: bus type pnp registered

and the system just hangs there, as a result, I cannot use the 8.04 version of Ubuntu, even I can boot the livecd with -quiet removed and installed it on my laptop, I cannot boot it.

I waited several weeks and the problem has not been solved yet.

Revision history for this message
Keith Drummond (kd353) wrote :

I can confirm this on a Sony Vaio VGN-N31S

Has a 'fix' for this been found or are they skipping on to Intrepid Ibex?

Revision history for this message
kalebdf (kalebdf) wrote :

I can confirm this bug in the 8.04 release on my Sony Vaio VGN-N365E/B.

I can definitely boot up by unplugging before I turn on the laptop and then plugging it in after logging in.
However, like everyone states, it freaks out spitting out ACPI Errors if it is plugged in.

Is there an official bug / kernel patch for this yet? I will try turning quiet off for now.

Disappointing since it seems like 7.10 was working fine.

-Kaleb

Revision history for this message
Nicolas Vandeput (nvdp-deactivatedaccount) wrote :

On my computer, it happens even if the AC is unplugged! I just have to turn it off and then back on, and generally, I have to do it two times. Adding the 'noapic' option at kernel boot (in /boot/grub/menu.lst) resolved the problem (I also tried with noacpi but this won't let my computer boot!).

Revision history for this message
Xavier Orr (xavierorr) wrote :

I can confirm this bug in the 8.04 release on my Sony Vaio VGN-C25G. why is it taking so long to be fixed ?

Revision history for this message
Ernesto Rico-Schmidt (nnrcschmdt) wrote :

I can also confirm the bug on a Sony VAIO VGN-N230N.

Can we expect the fix to be included in the upcoming point release (8.04.1)?

Revision history for this message
Keith Drummond (kd353) wrote :

I do not think this will be fixed in 8.04.1

Post #2 here: http://ubuntuforums.org/showthread.php?t=835987

Revision history for this message
Stefan Bader (smb) wrote :

This currently is a bit poking around in the dark but I put a test kernel at
http://people.ubuntu.com/~smb/bug191137 which has a cherry pick that might be related (a change to keep the boot_ec). If someone wants to try, this would be great.

Revision history for this message
Nicolas Vandeput (nvdp-deactivatedaccount) wrote :

Hi,

I tried your kernel: I installed the package, rebooted, chose your kernel and this simply didn't change anything. I got the same two error lines, and I had to reboot my computer 3 times with your kernel in order to boot it correctly.

So I just removed the package again... And I go on using the 'ro quiet splash noapic' options with the other kernel (2.6.24-16) so that the error happens less often.

I think I'll buy another laptop. Sony vaio's really suck with GNU/Linux.

Revision history for this message
Stefan Bader (smb) wrote :

Thanks Nicolas. As I said this was rather poking around a bit. Looking at the code I think there might be a chance for a race but I think I need some help by some upstream folks. If I don't get any response, the best way to proceed would be to revive the kernel bug but this also requires at least one of you (you as all those that suffer from this) to help tests and provide information.

Revision history for this message
Stefan Bader (smb) wrote :

I have prepared another kernel at http://people.ubuntu.com/~smb/bug191137 which does not contain a fix but some debugging statements. If anybody could try this and post the system messages printed. Thanks!

Revision history for this message
wvengen (wvengen) wrote :

Cold booted a Sony VAIO VGN-FE41E with Stefan's kernel. I had to remove the power cord to keep it from panic-ing (the stacktrace needs more than the whole screen so I can't see the messages in that case; I have no serial port to try serial console). With quiet option, this was displayed on screen:
  [ 13.849997] ACPI: EC: boot_ec created
  [ 13.862159] ACPI: EC: acpi_boot_ec_enable
  [ 14.360831] ACPI: EC: acpi_ec_wait timeout, status=0, expect_event=1
  [ 14.360831] ACPI: EC: read timeout, command=128
  Loading, please wait...
  [ 148.955486] BUG: soft lockup - CPU#1 stuck for 11s! [modprobe:1290]
  (repeats with different timestamps and hangs)

The dmesg of the next warm boot without the 'quiet' kernel option with the power present is attached. EC messages:
  [ 48.202845] ACPI: EC: boot_ec created
  [ 48.202910] ACPI: EC: Look up EC in DSDT
  [ 48.210325] ACPI: EC: acpi_boot_ec_enable
  [ 48.210738] ACPI: EC: non-query interrupt received, switching to interrupt mode
  [ 48.214017] ACPI: EC: acpi_boot_ec_enable success
  [ 48.232768] ACPI: EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
  [ 48.232838] ACPI: EC: driver started in interrupt mode
Another successful boot had exactly the same EC messages.

Revision history for this message
Stefan Bader (smb) wrote :

This sounds to me like my suspicions where right. Somehow the code to install the boot_ec handlers is called before the context has been created. Both places acpi_scan_init() (which tries to install the handlers) and acpi_ec_dsdt_probe() (which allocates the structure and would create the context) run concurrently but do not protect against this. I tried to reopen the kernel bugzilla but this could only be done by TJ or ykzhao (wrote an email to him). In the meantime I try to think of a way around that.

Stefan Bader (smb)
Changed in linux:
assignee: ubuntu-kernel-acpi → stefan-bader-canonical
Revision history for this message
Stefan Bader (smb) wrote :

I tried to come up with a way I think this might be solved. No way to tell this is acceptable upstream. As far as I understand things, from acpi_scan_init() the intention is to have the handlers installed. This seems to only make sense after the boot_ec has been set up (which also enables the handlers if successful, otherwise boot_ec might even be freed). So I changed the code so acpi_boot_ec_enable() will only wait until acpi_ec_dsdt_probe() hash finished. While this should work around the problem here, I am not sure whether it helps the problem which was tried to solve with the introduction of acpi_boot_ec_enable().
The new kernel is at the same location as the last one (smb3).

Revision history for this message
Nicolas Vandeput (nvdp-deactivatedaccount) wrote :

You're great!!

It now works perfectly. I even tried more than 10 times to be sure of it. It shows the following messages:

[ 17.XXXXXX] ACPI: EC: boot_ec created
[ 17.XXXXXX] ACPI: EC: waiting for boot ec setup
[ 17.XXXXXX] ACPI: EC: done

And then it boots without any problem. I then tried to reboot with my old kernel, and it blocked 4 times before I could boot (because I wanted to keep my drivers and resolution to be able to reply)... The bug appears to be solved.

Many thanks! I hope that this will be soon included in the updates. I'd pay you a beer if I could :)

Revision history for this message
Hao Zhe XU (haozhe.xu3) wrote :

It works now, will it be included in the next updates?

Revision history for this message
TJ (tj) wrote :

Stefan, sorry I've not been active on this for a while. I got around it by using the latest mainline kernel from my local git repository since I'm currently writing new PCI dynamic resource allocation functionality for mainline.

I got caught by this bug again today with the Hardy LiveCD when finally deciding to update the laptop's primary OS from Gutsy to Hardy, doing a clean install and using LUKS encryption with key-file.

If you want me to reopen the kernel bugzilla let me know.

I'm doing a clean install of Hardy to the laptop (after moving the Gutsy install to a back-up system) so I'll test your kernel with that and let you know the results.

Revision history for this message
Stefan Bader (smb) wrote :

TJ,

give me a second. There will be another kernel soon (smb4) which is the result of some discussion with upstream.
It would be great if you could give that a try.

Revision history for this message
Stefan Bader (smb) wrote :

Ok, long description follows. I am currently uploading the i386 kernel. Te one for amd64 will follow shortly. After some discussion the agreement was that this problem arises from the (so it seems) incorrect assumption that subsys_initcall functions will get executed in the order they are defined in the makefile. But from the debug output it clearly looks like those functions can even be executed concurrently which can cause a lot of races that were not thought of. The new kernel just moves some subsys_initcall functions into the bus.c init function. This will guarantee they are executed in an orderly fashion.

Revision history for this message
Stefan Bader (smb) wrote :

I know I am pressing but has anybody tried the smb4 kernel? I would love to see the fix in the next Hardy update but I need to know whether ith helps this problem.

Revision history for this message
jeremylee (jeremylee1228) wrote :

Sorry I am a new one here. I have tried your smb4 kernel, but the problem still exits. smb3 works for me though...

Revision history for this message
Stefan Bader (smb) wrote :

That is pretty bad. I was quite sure it should have the same effect in the end. But unfortunately I made that kernel in a bit of a hurry. I think I create another one with some more debug output to see the flow of execution.

Revision history for this message
Stefan Bader (smb) wrote :

Ok, I got smb5 at http://people.ubuntu.com/~smb/bug191137. This one has more debug output. So even if it doesn't work, please post the messages.

Revision history for this message
TJ (tj) wrote :

Stefan.

I have just completed my Hardy installation. I installed " linux-image-2.6.24-19-generic_2.6.24-19.34smb5_amd64.deb " and rebooted three times with

kernel /vmlinuz-2.6.24-19-generic root=/dev/mapper/VGencrypted-root ro quiet splash

and all three times the system started successfully.

dmesg shows:

[ 20.495528] ACPI: bus type pci registered
[ 20.495598] PCI: Using configuration type 1
[ 20.496713] ACPI: EC: acpi_ec_ecdt_probe() created boot_ed
[ 20.496771] ACPI: EC: Look up EC in DSDT
[ 20.496773] ACPI: EC: use EC in DSDT
[ 20.502636] ACPI: EC: no EC._INI
[ 20.504441] ACPI: Interpreter enabled
[ 20.504443] ACPI: (supports S0 S3 S4 S5)
[ 20.504457] ACPI: Using IOAPIC for interrupt routing
[ 20.504566] ACPI: EC: acpi_boot_ec_enable()
[ 20.504909] ACPI: EC: non-query interrupt received, switching to interrupt mode
[ 20.508199] ACPI: EC: acpi_boot_ec_enable() successful
[ 20.526852] ACPI: EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
[ 20.526855] ACPI: EC: driver started in interrupt mode
[ 20.526895] ACPI: PCI Root Bridge [PCI0] (0000:00)

Revision history for this message
Nicolas Vandeput (nvdp-deactivatedaccount) wrote :

Hi! I just tried the last one. Seems it doesn't work better.

When it works (most of the time), it shows:

[ 17.XXXXXX] ACPI: EC: acpi_ec_ecdt_probe() created boot_ed
[ 17.XXXXXX] ACPI: EC: use EC in DSDT
[ 17.XXXXXX] ACPI: EC: no EC._INI
[ 17.XXXXXX] ACPI: EC: acpi_boot_ec_enable()
[ 17.XXXXXX] ACPI: EC: acpi_boot_ec_enable() successful

But sometimes, I also get this error (I'd say every ten boots):

[ 17.XXXXXX] ACPI: EC: acpi_ec_ecdt_probe() created boot_ed
[ 17.XXXXXX] ACPI: EC: use EC in DSDT
[ 17.XXXXXX] ACPI: EC: no EC._INI
[ 17.XXXXXX] ACPI: EC: acpi_boot_ec_enable()
[ 17.XXXXXX] ACPI: EC: acpi_ec_wait timeout, status = 0, excpect-event = 1
[ 17.XXXXXX] ACPI: EC: read timeout, command = 128

Revision history for this message
Stefan Bader (smb) wrote :

This is very strange. The change seems to help TJ but in Nicolas case there still seems to be a problem. However the handlers are not installed before boot_ec has been finally created. So this might be a different race and frankly I am currently a bit clueless. Since the other kernel works this sounds like a small difference in timing does change a lot.

While I try to find any clues, Nicolas, could you add a bit more info about your machine? What Model and most interesting what type/speed of CPU(s).

Revision history for this message
TJ (tj) wrote :

I've been doing a series of restarts whilst debugging a suspend-lock-up issue, and this problem has occurred several times - sometimes accompanied by a BUG soft lock-up on one of the CPUs.

So it looks like the situation has improved slightly but not gone away.

When I was running Gutsy I solved it by using the latest mainline kernel (2.6.25+). It's possibly worth doing a git-bisect between 2.6.24-19 and 2.6.25-rc9.

I did a basic review of the commits but can't see anything obvious, but then again I may have set the criteria too narrowly:

git log --pretty=medium v2.6.24..v2.6.25-rc9 -- drivers/acpi/scan.c

Revision history for this message
Nicolas Vandeput (nvdp-deactivatedaccount) wrote :

Well, I've got many problems on this machine that don't get solved. It is a Sony vaio VGN-FE48E. I really think I'll buy a new one to use GNU/Linux! Most problems seem to come from the association of my Intel Centrino Duo with a Nvidia Geforce Go 7400. See the file attached for more info.

Revision history for this message
Stefan Bader (smb) wrote :

After a night's sleep and somewhat lower temperatures this morning the riddle is not that mysterious as yesterday. What I forgot somehow was, that in the smb3 kernel (when I used completion), I also removed the enable call for the handlers from acpi_boot_ec_enable(). This was done in the false assumption that the handlers get installed in most relevant cases. But as the logs of both of you show acpi_ec_ecdt_probe() will try to get information from the DSDT (after failing to get it from ECDT), but then (because there is no (fake) EC._INI method will stop with a boot_ec allocated but without handlers installed.

Now the question is whether this throws us back to the beginning. The idea about the incorrect serialization came from wvengen's comment (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/191137/comments/86) which looked like in some cases acpi_boot_ec_enable() was called before the message about DSDT was printed (which sounded pretty much as if that code was executed in parallel). But the logs of the last kernel shows that (at least now) the sequence is ok, but the installation of the handlers fails in some cases. Either there is still something running out of order or this could also be some sort of hardware dependency (like needing a bit of time until being ready).

Revision history for this message
TJ (tj) wrote :

Nicolas: I've been running Vaio's for a long time and not seen major problems. The laptop currently affected by this bug is a VGN-FE41Z with Intel T7200 and Nvideo Go 7600. It has been reliable and stable.

Stefan: Can you attach the source patch you're using against the current Hardy git HEAD so I can play around more intimately with it? Also, is there a mailing-list exchange on your upstream discussions about this I can review? As I said in an earlier comment, as soon as I began dropping in even the slightest debug reporting the issue went away so I left off work on it.
I'm convinced we should be able to identify a change between Hardy and mainline 2.6.25-rc9 that solved this since that was the mainline version I began working with on the PCI DRA functionality where I noticed this issue had gone away. It is of course possible that the underlying issue is still there but a slight timing change hides it. That said, the issue wasn't present in Gutsy either so we have a range of known git tags to work between.

Revision history for this message
Stefan Bader (smb) wrote :

The discussion was rather privately with more or less only Alexey Starikovskiy responding. And after seeing the patch that i used for serialization failing too, I reviewed the comment that made me think there is a race at that place. Unfortunately it turns out to be a false positive. The messages were taken from different sources, so some other output is naturally missing.
That said, it really sounds like it is only a matter of the printk's that make a difference. Gutsy did not show the problem since the change to enable/install the ec_handlers went into hardy only. So what we know is, installing the handlers fails if the timing is not right. If there is enough delay until that function is called it is ok. Otherwise there are problems.
It still might be a matter of a slightly different timing upstream but if you could do a git bisect that would be very valuable. For me it would just be the same. Trying to offer kernels with more and more patches to ACPI from upstream until there is a noticeable change. But of course this always has a delay to build that kernels and then push them...

Revision history for this message
TJ (tj) wrote :

I've done a few restarts with 2.6.26-rc9 (Intrepid) without incident.

I tried adding "acpi_serialize" to the kernel command line for 2.6.24-19 and it *seemed* to improve matters but then the system experienced a later lock which may or may-not have been related. I tried "max_cpus=1" on the basis that scheduling across SMP cores might be an issue but the failure still occurred.

I've also been looking at the changes to the (Group) CPU Scheduler between 2.6.22 .. 2.6.24 and 2.6.24 .. 2.6.26 since (and specifically the differences in the kernel .config files between Gutsy, Hardy, Intrepid, and main-line). If the theory about timing is highlighting a concurrency issue, the cause might be nowhere near the ACPI code.

Revision history for this message
Stefan Bader (smb) wrote :

I added the following two patches to the current debug kernel (both changing the ec_space_handler and uploaded a smb6 kernel. Unfortunately this is not a very efficient way to proceed.

commit b3b233c7d948a5f55185fb5a1b248157b948a1e5
Author: Alexey Starikovskiy <email address hidden>
Date: Fri Jan 11 02:42:57 2008 +0300

    ACPI: EC: Some hardware requires burst mode to operate properly

    Burst mode temporary (50 ms) locks EC to do only transactions with
    driver, without it some hardware returns abstract garbage.

    Reference: http://bugzilla.kernel.org/show_bug.cgi?id=9341

    Signed-off-by: Alecommit 3e71a87d03055de0b8c8e42aba758ee6494af083

Author: Alexey Starikovskiy <email address hidden>
Date: Fri Jan 11 02:42:51 2008 +0300

    ACPI: EC: Do the byte access with a fast path

    Specification allows only byte access for EC region, so
    make it separate from bug-compatible multi-byte access.
    Also do not allow return of garbage in supplied *value.

    Reference: http://bugzilla.kernel.org/show_bug.cgi?id=9341

    Signed-off-by: Alexey Starikovskiy <email address hidden>
    Signed-off-by: Len Brown <email address hidden>xey Starikovskiy <email address hidden>
    Signed-off-by: Len Brown <email address hidden>

Revision history for this message
Stefan Bader (smb) wrote :

TJ, as I would like to learn a bit more about the ACPI differences, could you post the DSDT you are using on your Sony?

Revision history for this message
TJ (tj) wrote : Re: [Bug 191137] Re: [Hardy] ACPI Embedded Controller (EC) stops boot when kernel boot 'quiet' option is enabled or AC power is connected

Stefan: You can find a tar.gz of most (~190) decompiled Sony Vaio DSDTs
attached to my SNC analysis page at http://tjworld.net/snc/ (the
Download all disassembled DSDT (.dsl) files) link).

Based on the collection of DSDTs I have I did post a warning to the
Ubuntu kernel mailing-list on 12 April 2008 with a list of the models I
suspected would be affected, based on my earlier analysis
(https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/191137/comments/28) which focused on the use of a STORE() operation in EC._REG().

VGN-AR31S-R0200J6, VGN-AR370E-R0200J6, VGN-C140G-R0030J4, VGN-C1S,
VGN-C1ZB-R0034J4, VGN-C22GH-R0080J4, VGN-C240E-R0080J4, VGN-C2S-R0080J4,
VGN-C2Z-R0080J4, VGN-FE11H-R0072J3, VGN-FE11H-R0074J3,
VGN-FE11M-R0172J3, VGN-FE11M-R0174J3, VGN-FE21H-R0100J3,
VGN-FE21M-R0130J3, VGN-FE31M, VGN-FE31M-R0170J3, VGN-FE41E-R0190J3,
VGN-FE41M-R0190J3, VGN-FE41Z-R0200J3, VGN-FE45G-R0190J3,
VGN-FE550G-R0074J3, VGN-FE590P-R0072J3, VGN-FE660G-R0133J3,
VGN-FE670G-R0130J3, VGN-FE690-R0172J3, VGN-FE770G-R0173J3, VGN-FE830,
VGN-FE870E-R0190J3, VGN-FE880EH-R0200J3, VGN-FS115M-R0104J0,
VGN-FS215B-R0040J1, VGN-FS215E-R0040J1, VGN-FS285H-R0040J1,
VGN-FS315E-R0084J1, VGN-FS315H-R0080J1, VGN-FS315S-R0084J1,
VGN-FS660W-R0044J1, VGN-FS730W-R0080J1, VGN-FS740W-R0080J1,
VGN-FS760W-R0080J1, VGN-FS965F-R0044J2, VGN-FS980-R0044J2,
VGN-FZ11M-R0050J7, VGN-N130G-R0020J4, VGN-N230E-R0070J4,
VGN-N31Z-R0100J4

Since then, based on more investigation, I wondered if it was more
likely the NOTIFY(BAT0) might be the failure point since, if the EC is
initialised before the ACPI namespace has been created then it is likely
the BAT0 object hasn't been created.

I generated a new DSDT that omits the Notify(BAT0) and installed it with
update-initramfs. The tests I did indicated the contents of the EC_REG()
method weren't the cause, though.

I find it strange, however, that it seems that so far only Sony Vaio's
are being reported with this issue. Maybe other makes are affected but
they're not coming to our attention.

In analysing the issue I did notice that there is a two second
difference in the timing of calls to EC_REG() between "quiet" and
verbose boot sessions. If concurrency/scheduling is the root cause of
this issue that would correlate nicely.

You can also find some commentary on the small DSDT change I made to fix
a battery reporting issue in "ACPI: battery-technology reported as
non-rechargeable" http://ubuntuforums.org/showthread.php?t=475801.

Revision history for this message
Stefan Bader (smb) wrote :

TJ, I must admit I am running out of ideas. I took a look at VGN-FE41Z-R0200J3.dsl (because I think you mentioned somewhere this is the model you are currently using). The _REG method has a lot of stuff in compared to the one from my Thinkpad (I also noticed the PNOT method doing another Notify).
But how this causes the problems is not really clear. Either by link order and the subsys calls or explicitely (with the patch I did) the execution flow is:

1. acpi_init
1.1 acpi_bus_init
1.1.1 acpi_ec_ecdt_probe (This creates the boot_ec but does not install the handlers)
2. acpi_scan_init (first register the acpi bus and then root object plus sleep and power button)
2.1 acpi_boot_ec_enable (this will first install the GPE and then the space handler which calls _REG)
2.2 acpi_bus_scan (I think this sets up the devices)

What I cannot understand is the fact that being slower in execution is better. If the problem are missing entries, this should get worse if it takes longer to process further. The only reason things could be the way they are (and unfortunately this is only a blind guess since I do not know the acpi area so well) might be that the triggers something on the hw side which takes a bit and if the progress is too quick the hw is not ready. But that is really nothing to work with.

So at the moment I think the best way to proceed is to reopen the kernel bugzilla and try to supply them with info and feedback.

Revision history for this message
TJ (tj) wrote :
Download full text (6.9 KiB)

Stefan, it looks like we're working along the same lines - more of which later.

Although I could re-open the bugzilla report the problem is, the first question asked is "does this affect the latest version?" and the answer is "NO" - not since 2.6.25-rc9 that I know about, and possibly earlier. With that I doubt there will be much interest in solving this since it'll be a case of "use the latest kernel".

I'd like to get a bit further in understanding the issue first.

I've attached a git-diff patch "ec_debug ACPI EC delayed printk() messaging" against the current ubuntu-hardy HEAD (commit 135a18b9c37be8...).

It introduces buffered printk() messages via calls to ec_debug(const char *func, const char *format, args...). It allows us to add as many debug messages as we want without affecting the timing because they are sent to the console immediately. It collects the messages in the buffer and auto-flushes if it gets full and more importantly, immediately after the 2 known code-points when the EC either succeeds or fails.

The resulting /var/log/dmesg entries need a bit of manual fix-up since the messages are out of time-sequence and have the time-of-printing prefixed too. It is pretty simple to use an editor to tidy it up and put the messages back in sequence though. Here's an example of a successful boot with 2.6.24.3 (manually built ubuntu-hardy tree). I wish there was an easy way to capture a failed boot, but non of the Vaio's have serial ports so we can't capture via a serial console. My usual method is to aim the DV-camcorder at the screen and record the output - I'll try that method later today.

It seems to show that the init_subsys for acpi_init() and acpi_scan_init() are overlapping. I need to add additional messages to determine which subsys is responsible for the call to acpi_boot_ec_enable(). I'll try to make progress later today and report back.

In the meantime, the attached patch is available if you want to drop in your own debug tracing messages to chase down other possibilities.

[ 23.109039] ACPI: bus type pci registered
[ 23.109168] PCI: Using configuration type 1
[ 23.110342] ACPI: acpi_ec_ecdt_probe()
[ 23.110345] ACPI: EC: Look up EC in DSDT
[ 23.117192] ACPI: Interpreter enabled
[ 23.117254] ACPI: (supports S0 S3 S4 S5)
[ 23.117510] ACPI: Using IOAPIC for interrupt routing
[ 23.117676] ACPI: acpi_boot_ec_enable()
[ 23.117677] ACPI: ec_install_handlers(<ec>)
[ 23.117713] ACPI: acpi_ec_space_handler(<args...>)
[ 23.117714] ACPI: acpi_ec_read(<ec>, <addres>, <data>)
[ 23.117715] ACPI: acpi_ec_transaction(<ec>, 128, <wdata>, 1, <rdata>, 1, 0)
[ 23.117717] ACPI: acpi_ec_wait(<ec>, 2, 0)
[ 23.117718] ACPI: acpi_ec_wait() else
[ 23.117720] ACPI: acpi_ec_transaction_unlocked(<ec>, 128, <wdata>, 1, <rdata>, 1, 0)
[ 23.117723] ACPI: acpi_ec_wait(<ec>, 2, 0)
[ 23.117724] ACPI: acpi_ec_wait() else
[ 23.117940] ACPI: acpi_ec_wait(<ec>, 1, 0)
[ 23.117941] ACPI: acpi_ec_wait() else
[ 23.117999] ACPI: acpi_ec_gpe_handler(<data>)
[ 23.118002] ACPI: EC: non-query interrupt received, switching to interrupt mode
[ 23.118310] ACPI: acpi_ec_gpe_handler(<data>)
[ 23.118322] ACPI: ac...

Read more...

Revision history for this message
Stefan Bader (smb) wrote :

> Although I could re-open the bugzilla report the problem is, the first question asked is "does this affect the latest
> version?" and the answer is "NO" - not since 2.6.25-rc9 that I know about, and possibly earlier. With that I doubt
> there will be much interest in solving this since it'll be a case of "use the latest kernel".

Yes, I realized later that this would very likely be happening.

> It seems to show that the init_subsys for acpi_init() and acpi_scan_init() are overlapping.

I can't see this from your data. Where do you think they overlap?

[ 23.109039] ACPI: bus type pci registered
[ 23.109168] PCI: Using configuration type 1
[ 23.110342] ACPI: acpi_ec_ecdt_probe()
[ 23.110345] ACPI: EC: Look up EC in DSDT
<there is no _INI method for EC, so boot_ec stays but handlers are not installed (-ENODEV)>
[ 23.117192] ACPI: Interpreter enabled
[ 23.117254] ACPI: (supports S0 S3 S4 S5)
[ 23.117510] ACPI: Using IOAPIC for interrupt routing
<up to here it is bus_init>
<after that it should be scan_init>
[ 23.117676] ACPI: acpi_boot_ec_enable()
[ 23.117677] ACPI: ec_install_handlers(<ec>)
<maybe it makes sense to print the adresses? The space handler would be called when something else tries to read or write to the EC adress space, right? Could this be a result of the _REG functions?>
[ 23.117713] ACPI: acpi_ec_space_handler(<args...>)
[ 23.117714] ACPI: acpi_ec_read(<ec>, <addres>, <data>)
[ 23.117715] ACPI: acpi_ec_transaction(<ec>, 128, <wdata>, 1, <rdata>, 1, 0)
[ 23.117717] ACPI: acpi_ec_wait(<ec>, 2, 0)
[ 23.117718] ACPI: acpi_ec_wait() else
[ 23.117720] ACPI: acpi_ec_transaction_unlocked(<ec>, 128, <wdata>, 1, <rdata>, 1, 0)
[ 23.117723] ACPI: acpi_ec_wait(<ec>, 2, 0)
[ 23.117724] ACPI: acpi_ec_wait() else
[ 23.117940] ACPI: acpi_ec_wait(<ec>, 1, 0)
[ 23.117941] ACPI: acpi_ec_wait() else
[ 23.117999] ACPI: acpi_ec_gpe_handler(<data>)
[ 23.118002] ACPI: EC: non-query interrupt received, switching to interrupt mode
[ 23.118310] ACPI: acpi_ec_gpe_handler(<data>)
[ 23.118322] ACPI: acpi_ec_space_handler(<args...>)
[ 23.118323] ACPI: acpi_ec_read(<ec>, <addres>, <data>)
[ 23.118324] ACPI: acpi_ec_transaction(<ec>, 128, <wdata>, 1, <rdata>, 1, 0)
<The following message I would suspect comes from acpi_bus_scan() in scan_ini()>
[ 23.118325] ACPI: PCI Root Bridge [PCI0] (0000:00)
[ 23.140534] ACPI: EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
[ 23.140598] ACPI: EC: driver started in interrupt mode

> I've been thinking there's a clue in the log output when it fails:
> ACPI: EC: acpi_ec_wait timeout, status=0, expect_event=1 // event 1 = ACPI_EC_EVENT_OBF_1 =
> output buffer full
> ACPI: EC: read timeout, command=128 // 128 = 0x80 = ACPI_EC_COMMAND_READ

So something wanted to read from EC but nothing comes back. Adding a BUG() statement in that case should give the answer to the question where the call comes from. And probably it makes sense to printk/debug the exact addresses that was requested.

Revision history for this message
TJ (tj) wrote :
Download full text (9.8 KiB)

I've improved the ec_debug code (and attached the update) so the init subsystem is reported, and it correctly deals with potential vprintk() buffer over-runs by flushing the ec_messages buffer in blocks of no more than 1020 characters. I added a dump_stack() followed by mdelay() but even with the video camera recording it the reports scroll up the screen far too fast (and blurred) to be readable even on freeze-frame.

From a successful session however, here's the relevant part:

[ 45.084736] ACPI: bus type pci registered
[ 45.084867] PCI: Using configuration type 1
[ 45.086043] ACPI: EC: acpi_init acpi_ec_ecdt_probe()
[ 45.086045] ACPI: EC: acpi_init make_acpi_ec()
[ 45.086046] ACPI: EC: Look up EC in DSDT
[ 45.092877] ACPI: Interpreter enabled
[ 45.092939] ACPI: (supports S0 S3 S4 S5)
[ 45.093194] ACPI: Using IOAPIC for interrupt routing
[ 45.093362] ACPI: EC: acpi_scan_init acpi_boot_ec_enable()
[ 45.093363] ACPI: EC: acpi_scan_init acpi_boot_ec_enable() call ec_install_handlers(boot_ec)
[ 45.093364] ACPI: EC: acpi_scan_init ec_install_handlers(<ec>) ec->handlers_installed=0
[ 45.093365] ACPI: EC: acpi_scan_init ec_install_handlers() acpi_install_gpe_handler() succeeded
[ 45.093400] ACPI: EC: acpi_scan_init acpi_ec_space_handler(<args...>)
[ 45.093402] ACPI: EC: acpi_scan_init acpi_ec_read(<ec>, <address>, <data>)
[ 45.093403] ACPI: EC: acpi_scan_init acpi_ec_transaction(<ec>, 128, <wdata>, 1, <rdata>, 1, 0)
[ 45.093404] ACPI: EC: acpi_scan_init acpi_ec_wait(<ec>, 2, 0)
[ 45.093405] ACPI: EC: acpi_scan_init acpi_ec_wait() else
[ 45.093408] ACPI: EC: acpi_scan_init acpi_ec_transaction_unlocked(<ec>, 128, <wdata>, 1, <rdata>, 1, 0)
[ 45.093411] ACPI: EC: acpi_scan_init acpi_ec_wait(<ec>, 2, 0)
[ 45.093412] ACPI: EC: acpi_scan_init acpi_ec_wait() else
[ 45.093559] ACPI: EC: acpi_scan_init acpi_ec_wait(<ec>, 1, 0)
[ 45.093560] ACPI: EC: acpi_scan_init acpi_ec_wait() else
[ 45.093618] ACPI: EC: acpi_scan_init acpi_ec_gpe_handler(<data>)
[ 45.093622] ACPI: EC: non-query interrupt received, switching to interrupt mode
[ 45.093930] ACPI: EC: acpi_scan_init acpi_ec_gpe_handler(<data>)
[ 45.093942] ACPI: EC: acpi_scan_init acpi_ec_space_handler(<args...>)
[ 45.093943] ACPI: EC: acpi_scan_init acpi_ec_read(<ec>, <address>, <data>)
[ 45.093944] ACPI: EC: acpi_scan_init acpi_ec_transaction(<ec>, 128, <wdata>, 1, <rdata>, 1, 0)
[ 45.093945] ACPI: EC: acpi_scan_init acpi_ec_wait(<ec>, 2, 0)
[ 45.093948] ACPI: EC: acpi_scan_init acpi_ec_transaction_unlocked(<ec>, 128, <wdata>, 1, <rdata>, 1, 0)
[ 45.093951] ACPI: EC: acpi_scan_init acpi_ec_wait(<ec>, 2, 0)
[ 45.094339] ACPI: EC: acpi_scan_init acpi_ec_gpe_handler(<data>)
[ 45.094350] ACPI: EC: acpi_scan_init acpi_ec_wait(<ec>, 1, 0)
[ 45.094777] ACPI: EC: acpi_scan_init acpi_ec_gpe_handler(<data>)
[ 45.094800] ACPI: EC: acpi_scan_init acpi_ec_space_handler(<args...>)
[ 45.094801] ACPI: EC: acpi_scan_init acpi_ec_read(<ec>, <address>, <data>)
[ 45.094802] ACPI: EC: acpi_scan_init acpi_ec_transaction(<ec>, 128, <wdata>, 1, <rdata>, 1, 0)
[ 45.094803] ACPI: EC: acpi_scan_init acpi_ec_wait(<ec>, 2, 0)
[ ...

Revision history for this message
Stefan Bader (smb) wrote :

Too bad, hmm maybe mdelay in dump_stack might work... Otherwise instead of the backtrace, when printing the addresses of reads this might get correlated with the working case. At least it should show which access fails...

Revision history for this message
TJ (tj) wrote :

The additional debug messages were enough to pinpoint the root-cause of the issue and I had a proof-of-concept patch succeed in fixing the issue last thing yesterday.

Today I worked through the commit logs from mainline and located the two commits that fix the issue in later kernels. I built and tested the kernel with those commit patches applied and couldn't provoke the bug.
Then, reviewing the comments on this bug, I realised that Stefan had referred to the same two commits on 9th July but, because there was no explicit link to a test kernel in his comment, I'd somehow overlooked it and not tried them.

I've currently got my PPA building kernel packages with the patches applied so that others can test the kernel easily. If/when the builds are complete please test the kernel package by adding my PPA to your apt sources:

$ sudo su
$ echo "deb http://ppa.launchpad.net/intuitivenipple/ubuntu hardy main" > /etc/apt/sources.list.d/intuitivenipple-ppa.list
$ apt-get update
$ exit

Update Manager should now discover the package and offer it for installation.

In case the PPA builds fail (kernel-builds are still a dark art!) I'm also building all binary packages. Once built they will be found at http://tjworld.net/ubuntu/bugs/lp191137/

For reference, the fix (if it solves everyone else's problems too) is to cherry-pick the commits:

b3b233c7d948a5f55185fb5a1b248157b948a1e5 Thu Jan 10 20:50:12 2008 -0500 ACPI: EC: Some hardware requires burst mode to operate properly
3e71a87d03055de0b8c8e42aba758ee6494af083 Thu Jan 10 20:49:14 2008 -0500 ACPI: EC: Do the byte access with a fast path

I was trying to figure out why they didn't get into the ubuntu-hardy tree since according to the logs there were several pulls from upstream after the commit date.

Please report your experiences with this test kernel package.

Revision history for this message
Hao Zhe XU (haozhe.xu3) wrote :

I tried the new kernel smb4 but wireless becomes weired, it detects some WIFI network which I never seen before, all previous working WIFI networks are not detected.

Revision history for this message
Stefan Bader (smb) wrote :

I had that patches included in smb6 at http://people.ubuntu.com/~smb/bug191137/.

@Hao, please could you try either that one or the one fomr TJ's PPA. Thanks.

Revision history for this message
wvengen (wvengen) wrote :

Just booted smb6 successfully with power cord attached :)
  [ 14.058061] ACPI: bus type pci registered
  [ 14.058143] PCI: Using configuration type 1
  [ 14.059446] ACPI: EC: acpi_ec_ecdt_probe() created boot_ec
  [ 14.059505] ACPI: EC: Look up EC in DSDT
  [ 14.065305] ACPI: EC: no EC._INI
  [ 14.067351] ACPI: Interpreter enabled
  [ 14.067407] ACPI: (supports S0 S3 S4 S5)
  [ 14.067424] ACPI: Using IOAPIC for interrupt routing
  [ 14.067554] ACPI: EC: acpi_boot_ec_enable()
  [ 14.068018] ACPI: EC: non-query interrupt received, switching to interrupt mode
  [ 14.076076] ACPI: EC: acpi_boot_ec_enable() successful
  [ 14.076132] ACPI: after acpi_boot_ec_enable() call
  [ 14.096516] ACPI: EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
  [ 14.096520] ACPI: EC: driver started in interrupt mode
I can't find TJ's kernels yet, however (neither ppa nor tjworld)

Revision history for this message
Stefan Bader (smb) wrote :

Ok, for completeness there is a 19.35smb1 kernel which incorporates the latest Hardy release with only those two upstream changes (without any debugging code). If you are happy with that I will try to get this into the next release. Time is tight, however...

Revision history for this message
TJ (tj) wrote :

The PPA builds failed (I somehow managed to mess up the ABI checks in
the upload package!) and my local builds for some reason didn't
auto-upload to my server after the builds completed.

It looks as if Stefan has it covered with his packages however so I'll
leave it to him since there's an SRU for this now.

Revision history for this message
Stefan Bader (smb) wrote :

SRU justification:

Impact: On several Sony laptop models the changes introduced by upstream
commit c04209a7948b95e8c52084e8595e74e9428653d3 to enable the EC handler
during scan will cause those machines to hang on boot.

Fix: There was a work-around to remove the quiet option on boot but that very much depended on timing and probably won't always work. There have been two upstream patches identified to solve the issue. Patches commited to Hardy as bdbcd262d8a378984716e3bc3bdbfd70841303ab and 40e55e8f3bb152df0e883dad572d263647f3c056.

Testcase: Boot a Hardy kernel on one of the affected laptops with the
quiet option active (default) and it will hang. With the fixes this
does not happen.

Changed in linux:
status: In Progress → Fix Committed
Revision history for this message
Hao Zhe XU (haozhe.xu3) wrote :

I have to confirm something:
1 do I try 34smb6 or 35smb1?
2 do I just double click the deb file and follow the instruction? Last time I did in this way but wireless does not work any more.
3 Is it possible to reverse back to previous kernel after I installed this patched kernel?
Thanks!

Revision history for this message
Hao Zhe XU (haozhe.xu3) wrote :

I mentioned the problem with smb4 before, I have a photo of it.

Revision history for this message
marco-peroverde (marco-peroverde) wrote :

I have that https://bugs.launchpad.net/ubuntu/+source/linux/+bug/191137/comments/124 Problem too
since I installed Hardy. But I think it's indipendent of this bug which is discussed here.

System hangs during boot with the messages Hao posted with this pic. Normally it's running up well
if you reboot then.

For me it's happening on a fe41z.

Revision history for this message
Stefan Bader (smb) wrote :

@Hao

1. Better 35smb1
2. Actually I never did it that way but by using 'dpkg --install'
3. Yes. Depending on which kernel you want download one of the linux-image packages at
http://archive.ubuntu.com/ubuntu/pool/main/l/linux/ and call the dpkg command with that.

@Weisswurst

was that with any of the debug kernels or just with the original one? If that was the stock ubuntu kernel, maybe you could try the 35smb1 kernel as well? If the problem persists there we have to look closer and decide whether this is something new or might relate to the current problem.

Revision history for this message
marco-peroverde (marco-peroverde) wrote :

Well this Vaio is my working machine. I can not try stuff during exams...
The problem occurs on every stock kernel since hardy. But it occures randomly.
Out of ten boots only one fails if not less. But if it fails there is a good chance that it fails again by the first reboot.
But then I don't see this message sometimes for weeks.

Thats why I never mentioned it.
I'm wondering why TJ never sees this message because he is also using a fe41z...

Revision history for this message
marco-peroverde (marco-peroverde) wrote :

Well, at the end of the month I could try some things. Then my exams are finished.
But I don't think, can reboot the vaio till this failure finally appears.

Revision history for this message
Hao Zhe XU (haozhe.xu3) wrote :

Yes that problem happens on standard kernel.

And 35smb1 does work, I only tried it once but it worked, no hangs any more.

I attached my dmesg output.

So when will the patch be formally released? At that time, do I keep using the patched 35smb1 kernel or use formally released one(do I reverse back to the standard kernel and then upgrade or the upgrade program automatically adjust my kernel with the latest standard one?)?

Revision history for this message
Stefan Bader (smb) wrote :

@Hao,

you could stay with that kernel. It should be more or less be the same as the upcoming security update (though that could overrule this kernel since there are chances that due to build problems the security release one might be 19.36). The fix now has been committed and should show up in the next point release (hardy-proposed). But I can't give an exact time estimate for that.

Revision history for this message
Hao Zhe XU (haozhe.xu3) wrote :

After installing the kernel, my update manager told me there are kernel updates, is it just the previous standard kernel? When the patch is released, do I update via update manager?
Thanks!

Revision history for this message
Stefan Bader (smb) wrote :

@Hao

the kernel you get offered is the 19.36 one which is the same as the 35smb1 without the fix for the booting problem. So you might want to stay with the smb1 kernel. The next kernel which should also have this fix in will be the next stable release update. To get that as soon as possible you have to enable hardy-proposed in your installation sources. You will then just use the update manager for that.

Steve Langasek (vorlon)
Changed in linux:
assignee: nobody → stefan-bader-canonical
importance: Undecided → High
status: New → In Progress
Revision history for this message
Stefan Bader (smb) wrote :

40e55e8f3bb152df0e883dad572d263647f3c056 + bdbcd262d8a378984716e3bc3bdbfd70841303ab in Hardy.

Changed in linux:
status: In Progress → Fix Committed
Revision history for this message
Steve Langasek (vorlon) wrote :

Accepted into -proposed, please test and give feedback here. Please see https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Keith Drummond (kd353) wrote :

Hi, thanks everyone for all the work you have been putting into this!

If I want to try to upgrade to Hardy again (from Gutsy) can I just hit the upgrade button in update manager and this new kernal will be incorporated or do I have to wait until the next point release of Hardy (8.6.2??) and make a fresh install?

OR can I fresh install Hardy, add 'noacpi' to be able to turn on the computer then follow the previous post instructions, then that should in theory fix the problem when I restart the computer?

Revision history for this message
Stefan Bader (smb) wrote :

If you do the upgrade, make sure you have hardy-porposed enabled in your installation sources and also either wait maybe until end of next week or make sure that all the needed packages are there. The uploads are still in progress and you want to have matching LUM and LRM packages ready.

Or if you like to be on Hardy faster, you can install freshly and work around the problem (noacpi seems a bit harsh, most people just had to remove the quiet option from the grub commandline) and then go for the 35smb1 kernel I prepared. Then (if hardy-proposed is enabled) you will get updated as soon as the packages are ready.

Revision history for this message
Thomas McKay (tom-mckay1) wrote : Re: [Bug 191137] Re: [Hardy] ACPI Embedded Controller (EC) stops boot when kernel boot 'quiet' option is enabled or AC power is connected

My deepest thanks go out to everybody involved in solving this showstopper
of a bug.
After enabling proposed and upgrading the kernel this bug is just a bad
dream now, a problem of the past.

THANK YOU!

Revision history for this message
Vladimir Meremyanin (v-stiff) wrote :

Awesome!
Works for me too, can confirm it.

Thanks guys!

On Thu, Jul 24, 2008 at 12:59, Martin Pitt <email address hidden> wrote:

> ** Tags added: verification-done
>
> ** Tags removed: verification-needed
>
> --
> [Hardy] ACPI Embedded Controller (EC) stops boot when kernel boot 'quiet'
> option is enabled or AC power is connected
> https://bugs.launchpad.net/bugs/191137
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
Keith Drummond (kd353) wrote :

THANK YOU! THANK YOU! THANK YOU!

I am confirming that the fix worked on my laptop (Sony Vaio VGN-N31S).

I made a fresh install of 8.04, deleted quiet option, booted to live session, installed Ubuntu, restarted.

After restart I hit 'esc', edited the grub command lines, booted into hardy, enabled the 'proposed' updated, then proceeded to update, then restarted.

After restart everything went perfect, not a single problem. I am here writing about it from Hardy Heron.

Did I say thank you??

Thanks again for all the effort you put into this fix.

Revision history for this message
neffets (2-launchpad-net-neffets-de) wrote :
Download full text (4.5 KiB)

Hey,

boot hanging occurs here too on non-Vaio laptop
I have a:
  HP Pavilion dv6000 (dv6500)
  CPU: AMD Turion(tm) 64 X2 Mobile Technology TL-56 (stating thats running on 800 MHz)

It hangs at BOOT when AC-power is connected.
- then I can wait for very long time,
OR: simply press two times the AC-power-button and it will proceed immediately.

attached dmesg and my DSDT
here the few lines arround the needed AC-power-button-press

...
[ 0.099536] ACPI: Core revision 20080321
[ 0.108006] ACPI: setting ELCR to 0200 (from 0ca0)
[ 0.112007] CPU0: AMD Turion(tm) 64 X2 Mobile Technology TL-56 stepping 02
[ 0.112007] Booting processor 1/1 ip 6000
[ 0.120007] Initializing CPU#1
[ 0.120007] Calibrating delay using timer specific routine.. 3600.35 BogoMIPS (lpj=7200700)
[ 0.120007] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[ 0.120007] CPU: L2 Cache: 512K (64 bytes/line)
[ 0.120007] CPU 1(2) -> Core 1
[ 0.120007] AMD C1E detected late. Force timer broadcast.
[ 0.199961] CPU1: AMD Turion(tm) 64 X2 Mobile Technology TL-56 stepping 02
[ 0.200012] Brought up 2 CPUs
[ 0.200012] Total of 2 processors activated (7204.70 BogoMIPS).
[ 0.200012] CPU0 attaching sched-domain:
[ 0.200012] domain 0: span 0-1
[ 0.200012] groups: 0 1
[ 0.200012] CPU1 attaching sched-domain:
[ 0.200012] domain 0: span 0-1
[ 0.200012] groups: 1 0
[ 0.200012] net_namespace: 644 bytes
[ 0.200012] Booting paravirtualized kernel on bare hardware
[ 0.200012] NET: Registered protocol family 16
[ 0.200012] EISA bus registered
[ 0.200012] ACPI: bus type pci registered
[ 0.200012] PCI: MCFG configuration 0: base e0000000 segment 0 buses 0 - 4
[ 0.200012] PCI: MCFG area at e0000000 reserved in E820
[ 0.200012] PCI: Using MMCONFIG for extended config space
[ 0.200179] PCI: Using configuration type 1 for base access
[ 0.200350] Setting up standard PCI resources
[ 0.204013] ACPI: EC: Look up EC in DSDT
[ 0.206659] ACPI: BIOS _OSI(Linux) query ignored via DMI
[ 0.207788] ACPI: Interpreter enabled
[ 0.207953] ACPI: (supports S0 S3 S4 S5)
[ 0.208185] ACPI: Using PIC for interrupt routing
[ 0.208185] ACPI: EC: non-query interrupt received, switching to interrupt mode

       HERE pressing shortly the AC-power-button will proceed immediately (otherwise hang for long time)

[ 0.224186] ACPI: EC: GPE = 0x10, I/O: command/status = 0x66, data = 0x62
[ 0.224186] ACPI: EC: driver started in interrupt mode
[ 0.224186] ACPI: PCI Root Bridge [PCI0] (0000:00)
[ 0.224186] PCI: Transparent bridge - 0000:00:08.0
[ 0.224498] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[ 0.224596] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P2P0._PRT]
[ 0.224619] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.XVR1._PRT]
[ 0.224654] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.XVR2._PRT]
[ 0.260188] ACPI: PCI Interrupt Link [LNK1] (IRQs 5 7 *10 11 14 15)
[ 0.260188] ACPI: PCI Interrupt Link [LNK2] (IRQs 5 7 10 *11 14 15)
[ 0.260894] ACPI: PCI Interrupt Link [LNK3] (IRQs 5 7 10 11 14 15) *0, disabled.
[ 0.261800] ACPI: PCI Interrupt Link [LNK4] (IRQs 5 7 10 11 ...

Read more...

Revision history for this message
neffets (2-launchpad-net-neffets-de) wrote :

Hey,

here the DSDT.aml from my laptop "HP Pavilion dv6500"
chipset claims to be MCP67

Kernel:
* standard 2.6.24-19-386 => no problem
* self-compiled 2.6.25.6 (on 20080615) => no problem
BUT
* now kernel 2.6.26 (first release compiled on 20080716) . HANGs
(-rw-r--r-- 1 root src 49441874 2008-07-14 00:43 linux-2.6.26.tar.bz2)

Revision history for this message
neffets (2-launchpad-net-neffets-de) wrote :
Revision history for this message
neffets (2-launchpad-net-neffets-de) wrote :

hp pavilion dv6500
Booting with kernel options:

without ac-power it hangs too
  after ac-power-button press (or wlan-button change) it boots further
  and hangs then later

[ 25.334075] ACPI: Battery Slot [BAT0] (battery present)
[ 25.335892] ACPI: WMI: Mapper loaded
[ 25.603699] ACPI: device:25 is registered as cooling_device2
[ 25.603699] input: Video Bus as /class/input/input7
[ 25.651769] ACPI: Video Device [UVGA] (multi-head: yes rom: no post: no)
[ 25.714598] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[ 25.908925] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[ 26.165161] ricoh-mmc: Ricoh MMC Controller disabling driver
[ 26.166967] ricoh-mmc: Copyright(c) Philip Langdale
[ 26.169028] ricoh-mmc: Ricoh MMC controller found at 0000:02:05.2 [1180:0843] (rev 12)
[ 26.169047] ricoh-mmc: Controller is now disabled.
[ 26.230659] input: PC Speaker as /class/input/input8
[ 175.522250] BUG: soft lockup - CPU#1 stuck for 140s! [swapper:0]
[ 175.522250] Modules linked in: pcspkr(+) soundcore ricoh_mmc mmc_core k8temp shpchp pci_hotplug video output wmi battery ac button evdev ext3 jbd mbcache sg sr_mod sd_mod cdrom ata_generic pata_acpi usbhid hid pata_amd ahci ssb libata ohci1394 ehci_hcd ohci_hcd ieee1394 scsi_mod forcedeth dock usbcore thermal processor fan fuse
[ 175.522250]
[ 175.522250] Pid: 0, comm: swapper Not tainted (2.6.26-20080716 #2)
[ 175.522250] EIP: 0060:[<c01176a2>] EFLAGS: 00000246 CPU: 1
[ 175.522250] EIP is at native_safe_halt+0x2/0x10

175.52... then I re-connected the ac-power chord into the laptop

Revision history for this message
Stefan Bader (smb) wrote :

@neffets

I thing this should go into a new bug since your problems started with a later kernel. The symptoms look similar but might be unrelated. If possible you should also try to run a vanilla upstream kernel to verify whether the problem does still exist.

Revision history for this message
Stefan Bader (smb) wrote :

@neffets

I think this should go into a new bug since your problems started with a later kernel. The symptoms look similar but might be unrelated. If possible you should also try to run a vanilla upstream kernel to verify whether the problem does still exist.

Revision history for this message
Cristian T (cristroncos) wrote :

Hey guys...I can confirm the fix works perfectly on my Sony Vaio VGN-C240E. Thanks a lot for all the work you put into this....It's really appreciated!

Revision history for this message
Jonas Steinmann (steinmann-jonas) wrote :

Works perfectly on my VAIO VGN-N21Z :)

Revision history for this message
Julian Alarcon (julian-alarcon) wrote :

@neffets

Maybe you are involve with this bug:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/204996

Revision history for this message
neffets (2-launchpad-net-neffets-de) wrote :

Hello,

acpi boots now without pause with the new kernels
    2.6.25.15
and
    2.6.27-rc3 (rc2 too)

Revision history for this message
bohemier (bohemier) wrote :

Hello everyone,

Thanks for the help, the proposed patches fixed the problem on my sony vaio vgn-n385qe

regards

Revision history for this message
bigDs54 (big-d-rumblin) wrote :

sorry to post here being im a bit new to the linux experience, how do i remove the quiet option? then enable 'proposed'? sorry am using sony vaio vgn-nr430 and trying to run ubuntu 8.04.1 i386, using live cd it freezes as soon as i pick either use without installing, check cd, or install. any help would be appreciated -d

Revision history for this message
Stefan Bader (smb) wrote :

To remove the quiet option from the live/install CDs by pressing F& (other options) and remove the quiet keyword.
Proposed is enabled from System->Administration->Software Sources. Activate the "pre-release updates" box under the updates tab.

Revision history for this message
Martin Pitt (pitti) wrote :

linux 2.6.24-21 copied to hardy-updates.

Changed in linux:
status: Fix Committed → Fix Released
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The above mentioned patches are already in Intrepid as well so marking from Fix Committed to Fix Released. Thanks.

Changed in linux:
status: Fix Committed → Fix Released
Changed in linux:
importance: Unknown → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.