Ubuntu

Jaunty will not boot on Dell Optiplex 760 unless hpet=disable

Reported by William Cattey on 2009-03-25
24
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned

Bug Description

MIT is switching its standard recommended desktop configuration from the Dell Optiplex 755 which has
reached end of life to the new Dell Optiplex 760.

Unfortunately there seems to be a nasty kernel bug around HPET.

Repeat by:

Boot today's Jaunty Live CD. (I used 3/24/09 and also as another test 1/13/09)
Select your language
Choose "Try Ubuntu..."

Actual results:

Blank screen

Expected results:

Working Jaunty.

Work-around:

Select option ACPI=off

----

We know from other bug reports for example,
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=515975
that there is a problem with HPET. That debian bug requests incorporation of a one-line kernel change
that changes an oops message from displaying on every clock tick to one that only appears once.

It does not, however get HPET working.

This problem manifests under Intrepid with much spewage of kernel oops messages and/or terribly slow operation.
Again the work-around of of turning off ACPI seems to help.

The hardware has ICH10 motherboard chips. How new are they?

Also I see a report that the PNP-BIOS is not working, but I do not know if this is relevant.

William Cattey (wdc-mit) wrote :

I have done more testing and learned much.
Setting kernel option
    hpet=disable
works around the problem.

Also, if you hit the power switch to create an interrupt, oh say about 40 times, you can get the bootstrap to the point where it sees the keyboard. At that point you can generate interrupts with the keyboard to the point where gdm starts, and sees the mouse.

After that, unless YOU create interrupts, the system sees no clock interrupts, and
does nothing. It's fun to type "ls<enter>" and have it produce no output till you move the mouse, or hit the ctrl key.

William Cattey (wdc-mit) wrote :

Although Intrepid still runs through the HPET code path, setting hpet=disable is the smallest scope change to get Jaunty working.

andrew m. boardman (amb-mit) wrote :

I have one of these new Dells as well; same problems here. I don't see a way to make Intrepid usable without disabling acpi completely. (...and losing a CPU.)

To answer a question above: a lot of the ICH10 support is fairly new.

I'm not convinced that the real problem isn't in Dell's firmware. I have the A02 bios released on 25 February 2009, as, I suspect, does the original reporter.

Mario Limonciello (superm1) wrote :

When's the last jaunty image you've ran? Can you check whether it works with Load BIOS Defaults (so as to rule out a post install configuration related item)

William Cattey (wdc-mit) wrote :

WOW!

I was skeptical that "Load BIOS Defaults" was going to make any difference, but INDEED, Jaunty started right up.

The image is the most recent one posted to cdimage.ubuntu.com/daily-live/current
which is dated 24-Mar-2009, filename, "jaunty-desktop-i386.iso"

QUESTION: Any suggestions on how we can track down what BIOS setting was trashed by whom?

Hi William:

I'd recommend going through and just flipping different settings in the
BIOS to find the one that is causing the problems. One we know which
one is doing it, we should be able to in the short term document it, and
in the long term fix what's breaking from it.

William Cattey wrote:
> WOW!
>
> I was skeptical that "Load BIOS Defaults" was going to make any
> difference, but INDEED, Jaunty started right up.
>
> The image is the most recent one posted to cdimage.ubuntu.com/daily-live/current
> which is dated 24-Mar-2009, filename, "jaunty-desktop-i386.iso"
>
> QUESTION: Any suggestions on how we can track down what BIOS setting
> was trashed by whom?
>
>

--
Mario Limonciello
*Dell | Linux Engineering*
<email address hidden>

William Cattey (wdc-mit) wrote :

The prospect of, flip a bios bit, boot CD, repeat for all bits seems rather daunting.

On the assumption that it was an Intrepid post-install nastiness with some software MIT integrated, I used biosdecode, dmidecode, dumpCmos and dumpSmbios to take
a snapshot of as much BIOS state as I could, and then did pretty much the same install I did that seemed to get me into this state of affairs.

Well, the wipe and install went fine with no HPET problems.

I did touch a couple BIOS settings, Changing boot order, and enabling PXE, but those changes didn't provoke any HPET problems.

It is as if Dell, after installing the software, set some BIOS value different from what one gets when does "Load BIOS Defaults", and that THAT is the root of the problem.

Mario Limonciello (superm1) wrote :

Hi William:

This is possible, but it's also possible that MIT made some changes to
the BIOS settings too before providing you the machine.

Towards your Dell Factory changing a setting:
Try toggling AHCI->ATA. If these systems were shipped with Windows,
it's quite possible that setting gets flipped in the factory to prevent
having to have another driver during factory windows installation.
Also, CPU XD support can get toggled in the factory (generally to off)
if it's known to give some problems with software that is supported on
the machine.

William Cattey (wdc-mit) wrote :

Actually, I was the one who cut open the box.
I also checked with the MIT folk who received the hardware from Dell.
The settings are as they came from the factory.

My current theory is that there are two defaults: "High Performance" defaults that are set at the factory before the system ships out, and "Always Works" defaults which are what you get when you choose the "Load BIOS Defaults".

I've also got a call to my Dell inside rep to see if MIT has any Configure To Order BIOS customizations going in before the systems ship out of Dell.

Even if there are any such possible deltas going on here from the Dell factory, the point stands there is a BIOS option that can cause HPET to not work. Once the actual option is identified, we can start to look at sustainable solutions.

William Cattey (wdc-mit) wrote :

With the help of a senior Tech at Dell, the relevant BIOS bit has been identified:

Under "Performance", the bit "C States Control" determines whether or not there will be a problem. If the additional processor sleep states are enabled by turning this option on, the HPET clock breaks.

I have Dell checking in on where in the manufacturing cycle this bit gets turned on.
It seems most likely that when MIT did "Custom Factory Integration" and enabled "Energy Star Setup" , we got this bit enabled.

Mario: Armed with this bit of information, is there a clear way forward?

Hi William:

Did the technician mention at all whether this option is only available
in certain CPU configurations? I'm looking at a 760 w/ BIOS A02 right
now, and don't see any such option under Performance (or anywhere else
for that matter).

I see:
* Multi Core Support
* Intel SpeedStep
* Limit CPUID Value
* HDD Acoustic Mode

Turning on SpeedStep would be the most logical thing I think for
affecting C-states, but I don't see any changes in behavior with it.

William Cattey (wdc-mit) wrote :

I have a Core 2 Duo E8400 CPU.

The Dell Senior Tech said that the C States are available with CPU chips that have the "EIST" (for Enhanced Intel Speedstep Technology) processor option. He'd like to know which CPU he has to re-confirm that the option isn't present for your CPU.

According to some google searching, the cpuinfo output should say "est" if that option is available.

Mario Limonciello (superm1) wrote :

Hi William:

It looks like it does claim to have EST:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 Duo CPU E4600 @ 2.40GHz
stepping : 13
cpu MHz : 2400.000
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl est tm2
ssse3 cx16 xtpr pdcm lahf_lm
bogomips : 4787.15
clflush size : 64
power management:

William Cattey (wdc-mit) wrote :

Thanks for the details, Mario.

Apparently my CPU has the C States thing, but yours does not, but the clue as to who does and who does not isn't obvious. The Dell Tech is going to follow up to find out.

So anybody wanting to reproduce my problem will need a fancier CPU than yours, but it isn't yet clear if the upgrade all the way up to the fanciness of mine is necessary.

William Cattey (wdc-mit) wrote :

Here is more information on the C States from the Dell Senior Tech:

C3 Deep Sleep
Stops all CPU internal and external clocks

Pentium II and above, but not on Core 2 Duo E4000 and E6000;
AMD Turion 64

http://www.hardwaresecrets.com/article/611/4

C4 Deeper Sleep
Reduces CPU voltage

Pentium M and above, but not on Core 2 Duo E4000 and E6000 series;
AMD Turion 64

http://www.hardwaresecrets.com/article/611/5

When I read those hardwaresecrets articles, I didn't understand how a normally running CPU could get into state C3, but the Dell Senior Tech provided additional clarification and speculation:

C3 is different from S3 (suspend / standby).

C3 can be entered while the system is on and running the operating system (usually with no affect on system performance).

The next state, Sleep (C3), cuts all internal clock signals from the CPU, including the clocks from the bus interface unit and from the APIC. This means that when the CPU is in the Sleep mode it can’t answer to important requests coming from the CPU external bus nor interruptions.

That seems to mean that the CPU is not listening for interrupt requests over APIC which may be causing the HPET to malfunction.

C4 is reached after further "idle" conditions, and has to transition to C3, then back up to C0, which could take longer for the APIC to allow interrupts.

----

So I guess the question to ask the Linux HPET driver gurus is: What do you do when you've got a CPU that can stop listening to APIC interrupts behind your back?

William Cattey (wdc-mit) wrote :

It was suggested to me out of band to look at the output of powertop.

So I enabled C States in the BIOS and got Jaunty running again (hitting the power switch 30-some times and the Ctrl key on the keyboard 500-some times to create interrupts to get it to where I could run powertop.

powertop showed the system was in the C3 state some of the time.
It reported interrupts from hpet3, hpet2, uhci_hcd:usb4, uhci_hcd:usb7, and Dell Laser mouse.

But it ONLY would refresh when I did something with the keyboard or mouse to get interrupts. The HPET 3 and HPET 2 interrupts are NOT waking the system out of the C3 state. Once a keyboard or mouse actually DOES wake the system, the enqueued HPET interrupt is handled.

----

When I boot the same Jaunty LiveCD and add "hpet=disable" the output of powertop seems quite sensible. It updates correctly. It says that the system is spending nearly all of its time in state C3 with 60% of the interrupts coming from something called, "<interrupt> extra timer interrupt". The next most frequent source of interrupts is "hr timer_start (tick_sched_timer)".

Given this data from powertop, do you folks think we're talking a BIOS bug, a hardware bug, or a Linux bug?

William Cattey (wdc-mit) wrote :

Someone suggested out of band that I open a kernel.org bugzilla.

By way of preparation for doing that, I searched the existing bugzilla for "ACPI CPU C3 interrupt" and turned up additional insight:

In BZ 10409
http://bugzilla.kernel.org/show_bug.cgi?id=10409

it was suggested leaving HPET enabled, and applying boot option 'acpi_skip_timer_override'

Running powertop with this option set, I am told the CPU is in C3 state 95.3% of the time, and that the wakeups are:

23% hpet3
18% hrtimer_start (tick_sched_timer)

I suspect that to those who understand how HPET ACPI interrupts work, this is probably the key insight into understanding what to do next.

Any suggestions for a next step for me to take?

Fabián Rodríguez (magicfab) wrote :

I've just received an Optiplex 760 and Jaunty Beta DVD ISO image runs just fine without any special kernel options. BIOS is A02 too.

Mario Limonciello (superm1) wrote :

Fabián:

Double check what CPU you have in the system. It needs to be one of the
higher end CPUs to be able to reproduce this. Also, with the higher end
CPU, you will have a BIOS option for it.

Fabián Rodríguez wrote:
> I've just received an Optiplex 760 and Jaunty Beta DVD ISO image runs
> just fine without any special kernel options. BIOS is A02 too.
>
>

--
Mario Limonciello
*Dell | Linux Engineering*
<email address hidden>

William Cattey (wdc-mit) wrote :

Fabián:

Also make sure that the BIOS setting, "C States Control" under "Performance" is set in the BIOS.
This setting is not enabled by default unless you go for an Energy Star Custom Factory Integration.
As Mario said, the option is only available for higher end processors.

Luis Fernandes (lfernandes) wrote :

I have an Optiplex 760 with Intrepid installed.
I did notice a slower that usual boot up in this computer but i didn't pay much attention until I tried the Jaunty CD, and thats how I found this bug report.

With Intrepid, sometimes, and only sometimes, I had to hit the keyboard during boot for the computer to resume, with Jaunty CD I couldn't have it to boot at all.

The BIOS is A01 and I had the "C States Control" enabled, after disabling I noticed faster boot in Intrepid and the Jaunty CD worked.

I can play around with the machine to try to find any other clues, if thats a good idea, just tell me what data do you need me to gather.
I am using the 32bit version of the Ubuntu 9.04 release.

William Cattey (wdc-mit) wrote :

Luis:

It was under Intrepid that I first noticed clock problems myself.

In fact, the Intrepid LiveCD has an old enough version of the kernel
that you may see what I call, "Infinite spew," where the HPET code prints a kernel oops on every clock tick.

I read a lot of deltas on kernel.org dealing with HPET issues. Apparently it's very tricky to get everyting right for HPET. The current Intrepid kernel lacks several cleanups of HPET, and so its operation is erratic. Jaunty contains the cleanest HPET code available, and this enabled us to understand the root cause of the HPET problems for the Dell 760:

Mis-directed clock interrupts when in the C3 energy saving CPU state.

I would say that reliable HPET operation with C States emabled is not possible with the Intrepid kernel. Worse, there is HPET code that gets used even if you disable HPET. For Intrepid, go into the bios and disable C States.

For Jaunty, the best work around is to set the boot option:
'acpi_skip_timer_override'

William Cattey (wdc-mit) wrote :

Good news! Dell has released an updated BIOS that remedies the problem.

Rev A03 for the Dell Optiplex 760 can be found at:

http://ftp.us.dell.com/bios/O760-A03.EXE

I have updated my test system to this revision, and confirmed that HPET
interrupts now correctly reach the Linux kernel with C states enabled.

This bug can now be CLOSED.

Changed in dell:
status: New → Fix Released
Changed in linux (Ubuntu):
status: New → Invalid
Changed in somerville:
status: New → Fix Released
no longer affects: dell
Timothy R. Chavez (timrchavez) wrote :

The bug task for the somerville project has been removed by an automated script. This bug has been cloned on that project and is available here: https://bugs.launchpad.net/bugs/1305936

no longer affects: somerville
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.