ec2 kernel crash invalid opcode 0000 [#1]

Bug #651370 reported by Scott Moser
86
This bug affects 11 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Andy Whitcroft
Maverick
Fix Released
Medium
John Johansen

Bug Description

SRU Justification:

Impact: Booting an Intel based instance with certain CPU level will fail with a panic as the driver does not seem to take into account that it is running in a virtualized environment. This only is a problem with the intel_idle driver.

Fix: Turning off intel_idle driver support for the virtual kernel image will let it use the generic idle driver as before. As this option is only changed for the virtual kernel package there is no risk of regression for the generic packages.

Testcase: Booting a large instance (with 68GB of memory) very likely results in this panic as the memory size will result in selecting certain base hardware with Intel CPUs. Turning the option off lets those instances boot again.

---

I saw a kernel crash in maverick RC testing. I will attach console output here, the system reported is the same AMI, but the issue occurred on c1.xlarge instance type.

The crash begins like this:
[2725458.312511] invalid opcode: 0000 [#1] SMP
[2725458.312521] last sysfs file:
[2725458.312526] CPU 0
[2725458.312529] Modules linked in:
[2725458.312536]
[2725458.312541] Pid: 0, comm: swapper Not tainted 2.6.35-22-virtual #33-Ubuntu /
[2725458.312548] RIP: e030:[<ffffffff8130805c>] [<ffffffff8130805c>] intel_idle+0xac/0x180
[2725458.312565] RSP: e02b:ffffffff81a01ec8 EFLAGS: 00010046

But possibly the interesting piece of data is earlier in the log:
[ 0.000000] pcpu-alloc: s91520 r8192 d23168 u122880 alloc=30*4096
[ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [0] 7
[2725457.617698] Xen: using vcpu_info placement
[2725457.617705] Built 1 zonelists in Node order, mobility grouping on. Total pages: 1809808
[2725457.617707] Policy zone: Normal
[2725457.617711] Kernel command line: root=LABEL=uec-rootfs ro console=hvc0

There, we go from an uptime of 0.000000 to 2725457 seconds (757 hours) during boot.

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: linux-image-2.6.35-22-virtual 2.6.35-22.33
Regression: No
Reproducible: No
ProcVersionSignature: User Name 2.6.35-22.33-virtual 2.6.35.4
Uname: Linux 2.6.35-22-virtual x86_64
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
CurrentDmesg:

Date: Wed Sep 29 18:03:42 2010
Ec2AMI: ami-7a699c13
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-east-1c
Ec2InstanceType: t1.micro
Ec2Kernel: aki-427d952b
Ec2Ramdisk: unavailable
Frequency: This has only happened once.
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1:
ProcCmdLine: root=LABEL=uec-rootfs ro console=hvc0
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcModules: acpiphp 18752 0 - Live 0xffffffffa0000000
SourcePackage: linux

CVE References

Revision history for this message
Scott Moser (smoser) wrote :
tags: added: iso-testing
Scott Moser (smoser)
description: updated
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi Scott,

If you could also please test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :

Moving this to confirmed, I attached 2 other console logs seeing this failure.
In both cases, the clock jumped forward by hundreds of thousands of seconds.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Scott Moser (smoser)
Changed in linux (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Brandon Black (blblack) wrote :

Having the same issue on c1.xlarge in us-east-1 (kernel crash on boot related to intel_idle). I've booted the Maverick release AMI several times on m1.large instances fine, but I seem to have a 50%+ failure rate getting it to initially boot without crashing on c1.xlarge. You're going to need to roll new AMIs when/if this bug is fixed, because the failure means inability boot far enough to get the kernel upgraded in the first place.

FWIW, I'm only even trying Maverick because of the unresolved kernel issues with Lucid on EC2 that have been hard to pin down (divide by zero panics in network-related areas of the kernel, apparent disk i/o lockups triggered by runaway CPU load triggered by apt somehow, etc...). What's going on with kernels on EC2? Is anyone at Ubuntu actually testing them?

Revision history for this message
Scott Moser (smoser) wrote : Re: [Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]

On Mon, 25 Oct 2010, Brandon Black wrote:

> Having the same issue on c1.xlarge in us-east-1 (kernel crash on boot
> related to intel_idle). I've booted the Maverick release AMI several
> times on m1.large instances fine, but I seem to have a 50%+ failure rate
> getting it to initially boot without crashing on c1.xlarge. You're

My experience is much lower than 50% failure rate. I've run literally
hundreds of instances. This bug seems to hit in fits.
The kernel team is interested in fixing these bugs.

> going to need to roll new AMIs when/if this bug is fixed, because the
> failure means inability boot far enough to get the kernel upgraded in
> the first place.

Agreed.

> FWIW, I'm only even trying Maverick because of the unresolved kernel
> issues with Lucid on EC2 that have been hard to pin down (divide by zero
> panics in network-related areas of the kernel, apparent disk i/o lockups
> triggered by runaway CPU load triggered by apt somehow, etc...). What's

Could you please open a bug ? Use ubuntu-bug /boot/vmlinuz-$(uname -r).
And please attach console output of a kernel panic.
I've not personally seen the bug you're describing.

> going on with kernels on EC2? Is anyone at Ubuntu actually testing
> them?

We do test the kernels, our test suite
(https://code.launchpad.net/~ubuntu-on-ec2/ubuntu-on-ec2/ec2-test) can
admittedly be improved, but prior to any release we launch dozens of
instances, spanning all sizes in all regions. I recently began
publishing test results at
https://code.launchpad.net/~ubuntu-on-ec2/ubuntu-on-ec2/ec2-test-results .

Revision history for this message
Brandon Black (blblack) wrote :

I tried to look in more detail at the crash this evening, because it's really causing me a lot of headache now. The most recent time I tried to boot a new c1.xlarge in us-east-1 this evening, I had to cycle through the crash/terminate/relaunch cycle 7 times before I got a working instance. I don't have a patch or answer yet, but I have a lot of hints:

1) c1.xlarge seems to be going through some changes of underlying CPU/hardware, which could explain the randomness. It probably depends which hardware you land on. The older ones are Xeon E5410 and the newer ones are Xeon E5506. So far the only times I've gotten non-crashed launches and thought to check, they've all been the E5410's.

2) The exact instruction throwing invalid opcode is MONITOR (0f 01 c8). The instructions MONITOR and MWAIT are used for efficient idling on newer CPUs, which I guess is the whole point of the intel_idle code we're crashing in.

3) These are not the sorts of instructions that can be executed in a VM environment like Xen without special support. Googling reveals discussions/patches to Xen for supporting these instructions in various ways (either as a hypercall encapsulating the whole monitor/wait pair, or masking the capability in CPUID so that Linux doesn't detect support and doesn't try to use it all). Various related links:

http://lists.xensource.com/archives/html/xen-devel/2010-04/msg00043.html
http://markmail.org/thread/terab63w744x3m2r
http://www.sfr-fresh.com/unix/misc/xen-4.0.1.tar.gz:a/xen-4.0.1/docs/misc/cpuid-config-for-guest.txt

4) intel_idle can be effectively disabled from the kernel commandline with intel_idle.max_cstate=0 ( http://kerneltrap.org/mailarchive/git-commits-head/2010/5/28/40718 ), which will fall back on acpi_idle behavior. If it still crashes, there's also a commandline flag "idle=nomwait" which might prevent acpi_idle from using mwait as well.

I don't know at this point where the true bug lies. It could be that the intel_idle code needs to make an exception to its detection routines under Xen. It could be that some of Amazon's Xen hosts are configured differently (wrt CPUID masking for mwait) than others. It could be any of a number of related things. However, I suspect new AMIs for Maverick on EC2 that disable mwait from the commandline in grub.conf/menu.lst per above might fix this. I'll try making my own AMIs with this change in the morning and see how it goes.

Revision history for this message
Brandon Black (blblack) wrote :

I forgot to add above: on the E5410 c1.xlarge's that do boot successfully, the kernel output contains:

Oct 26 07:37:55 ip-10-243-51-207 kernel: [ 0.210255] intel_idle: MWAIT substates: 0x2220
Oct 26 07:37:55 ip-10-243-51-207 kernel: [ 0.210257] intel_idle: does not run on family 6 model 23

Which I believe means that intel_idle figured out that it needs to disable itself on these. The E5506's are model 26 rather than 23. The intel_idle code has a case statement that switches on this model number. Model 23 (0x17) is commented out for "FUTURE_USE" and thus falls through to the "does not run" condition with the output above. Model 26 (0x1A) has a case statement and will attempt to use intel_idle support.

Revision history for this message
Brandon Black (blblack) wrote :

So far my test instances with one or both of the MWAIT-related kernel flags have given even worse results than the original: They boot showing intel_idle disabled on E5410 nodes only, but the (assumed) E5506 nodes just terminate themselves quickly with no console log output at all (even after waiting a while). I've opened a web support ticket with Amazon referencing my test AMI and this bug report to ask for their input.

Revision history for this message
Mikael Gueck (gumi) wrote :

I just tried to launch 16 * m2.4xlarge instances with ami-e43e0b90 in the eu-west-1b area, and not a single one would boot up successfully, because of this bug. Any workaround yet?

Revision history for this message
Brandon Black (blblack) wrote :

Well, I had a hunch this morning that perhaps my test AMI was faulty (perhaps some stupid issue related to block-device mapping, etc, which varies between the variations on c1.xlarge), since it wasn't packaged by the same methods/tools as the official one.

It seems this may be the case. Going off the hint from Mikael that m2.4xlarge may exhibit the problems more reliably, I did the following experiment this morning using EBS root persistence to make the change, rather than custom instance-store AMIs:

1) Booted ami-548c783d (Maverick 64-bit EBS official) on m1.large in us-east-1.
2) Logged into this machine and edited /boot/grub/menu.lst manually to add "intel_idle.max_cstate=0 idle=nomwait" to the kernel bootflags.
3) Rebooted, instance came up fine with messages showing intel_idle disabled.
4) Stopped the instance, used ec2-modify-instance-attributes to move it to type m2.4xlarge
5) Booted on m2.4xlarge successfully, no crash (cpuinfo shows Xeon X5550, which is also "model 26" like the failing c1.xlarges)
6) Edited menu.lst to remove the added bootflags and rebooted the instance again, (staying on same m2.4xlarge hardware)
7) Instance crashed on boot in intel_idle code as always

Given these results, I think the kernel flags will workaround this issue, I just built a bad test AMI during my first tests yesterday. Could someone rebuild a set of Maverick AMIs with these flags added from the get-go using whatever the official method of packaging Maverick AMIs is, for public testing among those of us experiencing the bug?

Revision history for this message
Scott Moser (smoser) wrote :

I just created a rebundled instance.
Please try ami-d258acbb
The id is owned by my personal ID, not Canonicals.

- launch ami-548c783d
  us-east-1 ami-548c783d canonical ebs/ubuntu-maverick-10.10-amd64-server-20101007.1
- modify /boot/grub/menu.lst to have:
  - # kopt=root=LABEL=uec-rootfs ro
  + # kopt=root=LABEL=uec-rootfs ro intel_idle.max_cstate=0 idle=nomwait
- update grub
 sudo update-grub-legacy-ec2
  # keep the local version
- clean up
 sudo rm -Rf /var/lib/cloud/ /home/ubuntu/.ssh /root/.ssh
- sudo poweroff
- ec2-create-image
- ec2-create-image i-02bf1e6f --name "smoser-lp-651370-ubuntu-maverick-10.10-amd64-server-20101007.1" --description "smoser's rebundle of ubuntu-maverick-10.10-amd64-server-20101007.1 to address LP: #651370"
- ec2-modify-image-attribute --launch-permission --add all ami-d258acbb

Revision history for this message
Mikael Gueck (gumi) wrote :

Brandon's and Scott's workaround works for me partly, but the kernel on an instance started in such a way seems to detect only 32 GB of memory even for a m2.4xlarge instance which should have 68.4 GB available, according to the EC2 instances page. Is this a side-effect of the workaround, or a completely separate bug?

Maveric results:
ubuntu@ip-10-230-9-87:~$ uname -a
Linux ip-10-230-9-87 2.6.35-22-virtual #35-Ubuntu SMP Sat Oct 16 23:19:29 UTC 2010 x86_64 GNU/Linux
ubuntu@ip-10-230-9-87:~$ ec2metadata --instance-type
m2.4xlarge
ubuntu@ip-10-230-9-87:~$ free
             total used free shared buffers cached
Mem: 32810684 667628 32143056 0 6444 32152
-/+ buffers/cache: 629032 32181652
Swap: 0 0 0

Expected results (from a SUSE 11 guest):
ip-10-230-45-187:~ # uname -a
Linux ip-10-230-45-187 2.6.32.19-0.3-ec2 #1 SMP 2010-09-17 20:28:21 +0200 x86_64 x86_64 x86_64 GNU/Linux
ip-10-230-45-187:~ # curl http://169.254.169.254/latest/meta-data/instance-type
m2.4xlarge
ip-10-230-45-187:~ # free
             total used free shared buffers cached
Mem: 71705116 2361584 69343532 0 10972 126424
-/+ buffers/cache: 2224188 69480928
Swap: 0 0 0

Revision history for this message
Brandon Black (blblack) wrote :

I wasn't able to boot on ami-d258acbb on m2.4xlarge. It seemed to come up without the special kernel options:

[ 0.000000] Linux version 2.6.35-22-virtual (buildd@allspice) (gcc version 4.4.5 (Ubuntu/Linaro 4.4.4-14ubuntu4) ) #33-Ubuntu SMP Sun Sep 19 21:05:42 UTC 2010 (Ubuntu 2.6.35-22.33-virtual 2.6.35.4)
[ 0.000000] Command line: root=LABEL=uec-rootfs ro console=hvc0

And then hung in intel_idle as expected. Also, confirmed apparent 32GB memory limit on this kernel + machine type.

Revision history for this message
Brandon Black (blblack) wrote :

What's the method for making the S3 AMIs by the way? When I tried before, I tried just doing standard ec2-bundle-vol stuff inside of a fixed Maverick, but my first attempts failed because of the root device not having LABEL=euc-rootfs in the newly-launched instances, and the second generation I manually switched the root to /dev/sda1, but had other mysterious boot failures. Is there some standard tool or script used to package the official AMIs that we can use to produce identical results (with small changes)?

Revision history for this message
Scott Moser (smoser) wrote :

Brandon,
 sorry about failing to get the command line changed in the ami i rebundled. I really thought I tested that it had the proper command line before posting here. The problem in my steps above was selecting "keep local version". I should have chosen "use maintainers version".
 Regarding simple changes to the s3 amis, the easiest thing to do (and actually what i would recommend for *non* simple changes) is to download the .tar.gz file from http://uec-images.ubuntu.com/releases/maverick/ . extract it, mount it loop back, modify files (or chroot and modify files), uec-resize-image (the downloaded filesystem image is only 2G). then euca-bundle-image euca-publish-image...

  I also registered 'ami-aa42b6c3' and verified boot on a t1.micro and checked it has the command line. John is hoping to get rebuilt kernel images that would have these options in the config. He should point to them sometime soon.

Revision history for this message
Scott Moser (smoser) wrote :

Mikael,
  I opened bug 667696 to address the 32G issue.
Brandon,
 I opened bug 667793 to address euca-bundle-vol not copying the filesystem label.

I copied you each on the respective bugs.

Revision history for this message
John Johansen (jjohansen) wrote :

There are maverick test kernels at

kernel.ubuntu.com/~jj/linux-image-2.6.35-23-virtual_2.6.35-23.36~ec2_amd64.deb
kernel.ubuntu.com/~jj/linux-image-2.6.35-23-virtual_2.6.35-23.36~ec2_i386.deb

Revision history for this message
Mikael Gueck (gumi) wrote :

John Johansen's suggested -23.36 kernel booted, but still exhibited bug 667796.

Linux ip-10-230-9-131 2.6.35-23-virtual #36~ec2 SMP Thu Oct 28 15:07:00 UTC 2010 x86_64 GNU/Linux

[ 0.000000] PERCPU: Embedded 30 pages/cpu @ffff88000e8c7000 s91520 r8192 d23168 u122880
[ 0.000000] pcpu-alloc: s91520 r8192 d23168 u122880 alloc=30*4096
[ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [0] 7
[8698363.527286] trying to map vcpu_info 0 at ffff88000e8d2020, mfn 10569b2, offset 32
[8698363.527290] cpu 0 using vcpu_info at ffff88000e8d2020
[8698363.527292] trying to map vcpu_info 1 at ffff88000e8f0020, mfn 1056994, offset 32
[8698363.527294] cpu 1 using vcpu_info at ffff88000e8f0020
...

ubuntu@ip-10-230-9-131:~$ free
             total used free shared buffers cached
Mem: 32810684 669128 32141556 0 7016 32268
-/+ buffers/cache: 629844 32180840
Swap: 0 0 0

Revision history for this message
Scott Moser (smoser) wrote :

Mike,
  Thanks for your test. Its interesting that we still see the time travel of roughly 100 days in your dmesg.
  I gather the system was otherwise usable ? Other than only showing 32G of memory.

Revision history for this message
Scott Moser (smoser) wrote :

I'm attaching a console output of a lucid 10.04 from:
us-east-1 ami-4a0df923 canonical ebs/ubuntu-lucid-10.04-amd64-server-20101020

This shows very interesting time travel (both forward and backward) on an otherwise functional instance.
Thus, while the kernel time messages are not pretty looking, they don't necessarily correlate with this bug occuring.

Changed in linux (Ubuntu Maverick):
status: New → In Progress
Changed in linux (Ubuntu):
status: Confirmed → In Progress
Stefan Bader (smb)
description: updated
Changed in linux (Ubuntu):
assignee: nobody → Andy Whitcroft (apw)
status: In Progress → Triaged
Changed in linux (Ubuntu Maverick):
assignee: nobody → John Johansen (jjohansen)
Stefan Bader (smb)
Changed in linux (Ubuntu Maverick):
importance: Undecided → Medium
Revision history for this message
Brandon Black (blblack) wrote :

Stefan: the ~32 vs ~64GB memory issue is very likely orthogonal and has a separate bug now (bug 667796). This issue is solely about intel_idle vs certain CPU types under Amazon's EC2 (Xen) environment. m2.4xlarge in us-east reproduces the crash on boot readily (and also happens to exhibit the memory limit issue), and c1.xlarge reproduces it some of the time (depending which hardware you are randomly assigned).

Revision history for this message
Scott Moser (smoser) wrote :

@Brandon,
  Stefan's comment in the SRU justification about 68G of memory (which should have been 64) is really only suggesting that selection of a larger instance size seems more likely to land you on newer hardware where failure is more likely.

Revision history for this message
Stefan Bader (smb) wrote :

@Brandon, sorry for the late response. Have been traveling. And yes, Scott's reply is right. The comment about 68G was made because selecting this size seems to trigger the crash more reliably. But it has nothing to do with the memory size itself. Just that requesting that size seems to get you a recent Intel box behind the covers. Just found this to happen while looking at another bug about 68G not being detected correctly in Maverick and finding that I never get the instance up due to this.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.37-3.11

---------------
linux (2.6.37-3.11) natty; urgency=low

  [ Andy Whitcroft ]

  * Revert "ubuntu: AUFS -- update to
    b37c575759dc4535ccc03241c584ad5fe69e3b25"
  * Revert "ubuntu: AUFS -- track changes to the arguements to fop fsync()"
  * Revert "ubuntu: AUFS -- update to standalone 2.6.35-rcN as at 20100601"
  * Revert "ubuntu: AUFS -- update to standalone 2.6.34 as at 20100601"
  * Revert "ubuntu: AUFS -- aufs2 base patch for linux-2.6.34"
  * [Config] Disable intel_idle for -virtual kernels
    - LP: #651370
  * [Config] enforcer -- ensure we never enable CONFIG_IMA
  * debian -- pass the correct flavour name when checking configs
  * [Config] enforcer -- ensure CONFIG_INTEL_IDLE is off for -virtual
  * [Config] ensure CONFIG_IPV6=y for powerpc
  * [Config] enforcer -- ensure CONFIG_IPV6=y
  * ubuntu: AUFS -- aufs2-base.patch aufs2.1-36-UNRELEASED-20101103
  * ubuntu: AUFS -- aufs2-standalone.patch aufs2.1-36-UNRELEASED-20101103
  * ubuntu: AUFS -- update to aufs2.1-36-UNRELEASED-20101103
  * ubuntu: AUFS -- re-enable
  * ubuntu: AUFS -- track changes to work queue initialisation
  * ubuntu: AUFS -- track changes to llseek in v2.6.37-rc1
  * SAUCE: fbcon -- fix race between open and removal of framebuffers
  * SAUCE: fbcon -- fix OOPs triggered by race prevention fixes
    - LP: #614008
  * SAUCE: drm -- stop early access to drm devices

  [ Jeremy Kerr ]

  * [Config] Build-in powermac ZILOG serial driver
    - LP: #673346

  [ Kees Cook ]

  * SAUCE: nx-emu: use upstream ASLR when possible

  [ Tim Gardner ]

  * [Config] Use correct be2iscsi module name in d-i/modules/scsi-modules
    - LP: #628776

  [ Upstream Kernel Changes ]

  * i386: NX emulation
  * nx-emu: drop exec-shield sysctl, merge with disable_nx
  * nx-emu: standardize boottime message prefix
  * mmap randomization for executable mappings on 32-bit
  * exec-randomization: brk away from exec rand area
 -- Andy Whitcroft <email address hidden> Thu, 11 Nov 2010 23:46:37 +0000

Changed in linux (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
marstonstudio (jon-marstonstudio) wrote :

will a fix for this be backported to Maverick?

Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Accepted linux into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in linux (Ubuntu Maverick):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed' to 'verification-done'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Scott Moser (smoser) wrote :

I've verified this:

 * start instance of (t1.micro)
   # us-east-1 ami-548c783d ebs/ubuntu-maverick-10.10-amd64-server-20101007.1
 * ssh instance, install kernel reboot
   % wget https://launchpad.net/ubuntu/+archive/primary/+files/linux-image-2.6.35-24-virtual_2.6.35-24.42_amd64.deb
   % sudo dpkg -i linux-image-2.6.35-24-virtual_2.6.35-24.42_amd64.deb
   % sudo reboot
 * ssh instance again, verify in new kernel, then shutdown
   % $ uname -a
     Linux ip-10-202-31-117 2.6.35-24-virtual #42-Ubuntu SMP Thu Dec 2 05:15:26 UTC 2010 x86_64 GNU/Linux
   % sudo poweroff

 * for each type c1.xlarge, m2.2xlarge
   $ ec2-stop-instances ${IID}
   $ ec2-modify-instance-attribute --instance-type ${ITYPE} ${IID}
   $ ec2-start-instances ${IID}
   # 5 times test reboot (note, the cpu info hopefully
   # shows E5506 where it failed before)
   $ for i in 1 2 3 4 5; do ssh $EC2_HOST "uname -a; uptime;
         grep "Xeon" /proc/cpuinfo | head -n 1; sudo reboot" &&
         echo "$i: passed" || echo "$i: failed"; sleep 2m; done
   $ ssh $EC2_HOST sudo poweroff

I got an instance with X5550 in both c1.xlarge and m2.2xlarge and successfully rebooted and connected 5 times in a row.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Stefan Bader (smb) wrote :

I can confirm to be able to boot the latest kernel in a m2.4xlarge instance which was usually crashing because it landed on hardware that triggered the intel_idle driver to load.

Revision history for this message
Ed Swierk (eswierk) wrote :

Also confirmed that I can boot the kernel from #30 in an m2.4xlarge instance. It still sees only 32 GB of memory, though (bug 667796).

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (9.0 KiB)

This bug was fixed in the package linux - 2.6.35-24.42

---------------
linux (2.6.35-24.42) maverick-proposed; urgency=low

  [ Brad Figg ]

  - LP: #683422

  [ Colin Ian King ]

  * SAUCE: Allow registration of handler to multiple WMI events with same
    GUID
    - LP: #676997
  * SAUCE: Add WMI hotkeys support for Dell All-In-One series
    - LP: #676997
  * [Config] Enable Dell All-In-One WMI Hotkeys driver
    - LP: #676997

  [ David Woodhouse ]

  * [Upstream] Call acpi_video_register() in intel_opregion_init() failure
    path
    - LP: #615947

  [ Manoj Iyer ]

  * SAUCE: enable rfkill for rtl8192se driver
    - LP: #640992
  * SAUCE: Enable jack sense for Thinkpad Edge 11
    - LP: #677210

  [ Tim Gardner ]

  * [Config] Use correct be2iscsi module name in d-i/modules/scsi-modules
    - LP: #628776
  * [Config] Added NFS and related modules to virtual flavour
    - LP: #659084
  * [Config] Add support for cross compiling armel
  * Simplify the use of CROSS_COMPILER

  [ Upstream Kernel Changes ]

  * Revert "(pre-stable) ACPI: enable repeated PCIEXP wakeup by clearing
    PCIEXP_WAKE_STS on resume"
  * Revert "(pre-stable) mm: Move vma_stack_continue into mm.h"
  * x86, cpu: After uncapping CPUID, re-run CPU feature detection
    - LP: #672664
  * ALSA: sound/pci/rme9652: prevent reading uninitialized stack memory
    - LP: #672664
  * ALSA: oxygen: fix analog capture on Claro halo cards
    - LP: #672664
  * ALSA: hda - Add Dell Latitude E6400 model quirk
    - LP: #643891, #672664
  * ALSA: prevent heap corruption in snd_ctl_new()
    - LP: #672664
  * ALSA: rawmidi: fix oops (use after free) when unloading a driver module
    - LP: #672664
  * hwmon: (lis3) Fix Oops with NULL platform data
    - LP: #672664
  * USB: fix bug in initialization of interface minor numbers
    - LP: #672664
  * usb: musb: gadget: fix kernel panic if using out ep with FIFO_TXRX
    style
    - LP: #672664
  * usb: musb: gadget: restart request on clearing endpoint halt
    - LP: #672664
  * HID: hidraw, fix a NULL pointer dereference in hidraw_ioctl
    - LP: #672664
  * HID: hidraw, fix a NULL pointer dereference in hidraw_write
    - LP: #672664
  * ahci: fix module refcount breakage introduced by libahci split
    - LP: #672664
  * lib/list_sort: do not pass bad pointers to cmp callback
    - LP: #672664
  * ACPI: invoke DSDT corruption workaround on all Toshiba Satellite
    - LP: #672664
  * oprofile: Add Support for Intel CPU Family 6 / Model 29
    - LP: #672664
  * oprofile, ARM: Release resources on failure
    - LP: #672664
  * RDMA/cxgb3: Turn off RX coalescing for iWARP connections
    - LP: #672664
  * drm/radeon/kms: fix bad cast/shift in evergreen.c
    - LP: #672664
  * drm/radeon/kms: avivo cursor workaround applies to evergreen as well
    - LP: #672664
  * ARM: 6400/1: at91: fix arch_gettimeoffset fallout
    - LP: #672664
  * ARM: 6395/1: VExpress: Set bit 22 in the PL310 (cache controller)
    AuxCtlr register
    - LP: #672664
  * V4L/DVB: gspca - main: Fix a crash of some webcams on ARM arch
    - LP: #672664
  * V4L/DVB: gspca - sn9c20x: Bad transfer size of Bayer images
    - LP: #672664
  * mmc: sdhci-s3c: fix NULL ptr acc...

Read more...

Changed in linux (Ubuntu Maverick):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.