Ubuntu

t1.micro instance hangs when installing java

Reported by Gabriel Nell on 2010-09-09
446
This bug affects 73 people
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
Undecided
Andy Whitcroft
linux (Ubuntu)
Undecided
Unassigned
Lucid
Undecided
Unassigned
Maverick
Undecided
Unassigned
Natty
High
Canonical Kernel Team
linux-ec2 (Ubuntu)
Medium
Unassigned
Lucid
Medium
Unassigned
Maverick
Undecided
Unassigned
Natty
Medium
Unassigned
openjdk-6 (Debian)
New
Undecided
Unassigned
openjdk-6 (Ubuntu)
Undecided
Unassigned
Lucid
Undecided
Unassigned
Maverick
Undecided
Unassigned
Natty
Undecided
Unassigned
sun-java6 (Ubuntu)
Undecided
Unassigned
Lucid
Undecided
Unassigned
Maverick
Undecided
Unassigned
Natty
Undecided
Unassigned

Bug Description

Binary package hint: cloud-init

I booted the 32bit EBS lucid AMI (ami-1234de7b) for a t1.micro instance. I attempted to install Sun Java. The instance hung during the install. Repros every time. Only repros on t1.micro instances. I tried adding swap in case it was an out-of-memory condition, and it still repro'd. No reboots or anything else like that were involved so it's not the same issue as #634102

Console log snippet (full one attached):

[ 525.195499] ------------[ cut here ]------------
[ 525.195515] kernel BUG at /build/buildd/linux-ec2-2.6.32/arch/x86/mm/hypervisor.c:461!
[ 525.195522] invalid opcode: 0000 [#1] SMP
[ 525.195527] last sysfs file: /sys/kernel/uevent_seqnum
[ 525.195531] Modules linked in: ipv6
[ 525.195537]
[ 525.195541] Pid: 8663, comm: java Not tainted (2.6.32-308-ec2 #15-Ubuntu)
[ 525.195545] EIP: 0061:[<c0118550>] EFLAGS: 00010282 CPU: 0
[ 525.195553] EIP is at T.566+0x150/0x180
[ 525.195557] EAX: ffffffea EBX: c1f17e70 ECX: 00000002 EDX: 00000000
[ 525.195561] ESI: 0d537000 EDI: 00000004 EBP: c1f17ebc ESP: c1f17e60
[ 525.195565] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
[ 525.195580] Process java (pid: 8663, ti=c1f16000 task=e5373200 task.ti=c1f16000)
[ 525.195588] Stack:
[ 525.195591] c103a020 c1f17ee0 80000000 00000061 0000000e 00000000 c1c81000 0d537061
[ 525.195603] <0> 80000004 00000000 00000000 00000040 00000001 ffffffea c1f17ee0 00000001
[ 525.195617] <0> 00000000 00007ff0 00000000 00000000 0d537001 00000004 c14e5000 c1f17efc
[ 525.195632] Call Trace:
[ 525.195640] [<c01188dd>] ? xen_l3_entry_update+0x12d/0x1b0
[ 525.195647] [<c01145de>] ? pud_populate+0x9e/0xc0
[ 525.195654] [<c01aed59>] ? __pmd_alloc+0x99/0xa0
[ 525.195659] [<c01b2eb9>] ? handle_mm_fault+0x509/0x5b0
[ 525.195665] [<c01b7b2a>] ? do_munmap+0x22a/0x2b0
[ 525.195672] [<c05370f9>] ? do_page_fault+0x119/0x340
[ 525.195677] [<c0536fe0>] ? do_page_fault+0x0/0x340
[ 525.195683] [<c0535525>] ? error_code+0x3d/0x44
[ 525.195687] Code: c3 64 a0 cc 40 6f c0 84 c0 75 28 8b 5d a8 b9 01 00 00 00 31 d2 be f0 7f 00 00 e8 dc 8a fe ff 85 c0 79 d6 0f 0b eb fe 8d 74 26 00 <0f> 0b eb fe 0f 0b eb fe 8b 45 a8 31 c9 ba 01 00 00 00 c7 04 24
[ 525.195754] EIP: [<c0118550>] T.566+0x150/0x180 SS:ESP 0069:c1f17e60
[ 525.195769] ---[ end trace 9706b235d81a7968 ]---

Script which can 100% repro the problem:

#!/bin/bash

export DEBIAN_FRONTEND=noninteractive
add-apt-repository "deb http://archive.canonical.com/ lucid partner"
apt-get update
echo 'sun-java6-bin shared/accepted-sun-dlj-v1-1 boolean true
sun-java6-jdk shared/accepted-sun-dlj-v1-1 boolean true
sun-java6-jre shared/accepted-sun-dlj-v1-1 boolean true
sun-java6-jre sun-java6-jre/stopthread boolean true
sun-java6-jre sun-java6-jre/jcepolicy note
sun-java6-bin shared/present-sun-dlj-v1-1 note
sun-java6-jdk shared/present-sun-dlj-v1-1 note
sun-java6-jre shared/present-sun-dlj-v1-1 note
'|debconf-set-selections
apt-get -y install sun-java6-jdk

Gabriel Nell (gabriel-nell) wrote :
Gabriel Nell (gabriel-nell) wrote :

Adding shell script which repro's the problem reliably.

Scott Moser (smoser) on 2010-09-09
affects: cloud-init (Ubuntu) → linux-ec2 (Ubuntu)
Scott Moser (smoser) on 2010-09-09
Changed in linux-ec2 (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed
Scott Moser (smoser) wrote :

I'm currently testing this on maverick daily
us-east-1 ami-00a85d69 ebs/ubuntu-maverick-daily-i386-server-20100910

So far, there is no crash as in lucid. However, the system is very unresponsive. There is a process running:
/usr/lib/jvm/java-6-sun-1.6.0.20/bin/java -client -Xshare:dump -Xmx256m -XX:PermSize=128m
that takes up a large amount of cpu. watching 'top' shows cpu split between this process and rsyslogd.

syslog is 81M at the moment, filled with kernel crash info.
I'm attaching 1000 lines from the beginning of /var/log/syslog and from the end.

Kent Forschmiedt (kentf) wrote :

Linux takes a page fault, and in the process of updating the page tables to map the new page, it needs to allocate a new "PMD", or mid-level page directory entry. The PV kernel cannot write its own page tables, Xen has to do that, so a "multicall" that encapsulates two hypercalls is dispatched to Xen:
* update_va_mapping: Map the PMD page in the guest memory and flush the mapping cache (TLB)
* mmu_update: Update the contents of the PMD page
Xen returns a fatal error which causes the guest kernel to crash.

The "multicall" scheme does not return the failure status. Most of the failure paths have no logging, and the log levels are not usually tuned up enough to show anything.

The next steps are:
Turn up the Xen logging level and repro.
Add logging to all of the failure cases in the hypercalls and load the test host with the diagnostic Xen build.
Repro and see what we learn.

Some other notes:
* I wonder how easy it is to repro. I wrote some simple test programs and only got the expected out of memory errors. It may be that the memory has to run out just in time to need a new PMD. That is trickier than it sounds, because it depends on the number of virtual pages allocated vs. exactly how many have actually been touched.
* I studied the memory manager code in the Ubuntu kernel extensively and compared with other kernels that did not repro the problem. It may be that this bug exists in many, most or all versions, but needs exactly the right circumstances to trigger it.
* Good news - the micro is a single-cpu instance, so this is not a race in the guest and is almost certainly not a race in Xen.
* Some of the warning oopses in the Ubuntu 2.6.35 kernel are similar to this. I have not studied that yet.

Gabriel Nell (gabriel-nell) wrote :

Thanks Kent. Regarding the reproducability, it happens every time I've tried it on the 32bit Lucid AMI.

Scott Moser (smoser) wrote :

I've done some more testing/poking and here's what I've found:

suite___ |arch| java_______ | result
lucid___ |32 |sun-java6-jdk | fail, system unreachable
lucid___ |32 |openjdk-6-jdk | fail, system unreachable
maverick |32 |sun-java6-jdk | bad performance, dpkg hang
maverick |32 |openjdk-6-jdk | bad performance, dpkg hang
lucid___ |64 |sun-java6-jdk | no failure
lucid___ |64 |openjdk-6-jdk | no failure
maverick |64 |sun-java6-jdk | no failure
maverick |64 |openjdk-6-jdk | no failure

the maverick tests were performed on 20100913 daily build (ami-48897c21/ami-46897c2f in us-east-1). The lucid tests were performed on ubuntu-lucid-10.04-amd64-server-20100827 (ami-1234de7b/ami-1634de7f in us-east-1).

more information on the 'result' column above:
 * fail, system unreachable: system crashes, ssh connection is terminated, system cannot be ssh'd to. The console log will eventually show the kernel trace as attached here.
 * bad performance, dpkg hang: upon installation, rsyslogd and the 'java' process mentioned above will peg CPU. The java pid is not killable. A kill -9 will simply send it to <defunct> (per 'ps' output). The dpkg process will not return, waiting for the java process.
 * no failure: class data sharing is not enabled on amd64, so the 'java -Xshare:dump' is not run, and apt installation proceeds and finishes. The preinstall script for jre-headless runs 'java -client -Xshare:dump -XX:PermSize=128m' only on i386 or sparc.

So, if you want to avoid this, you can just use an amd64 image. Additionally, for debugging, the maverick i386 images will be better as the system can be looked at (albeit painfully) while it is failing.

Kent Forschmiedt (kentf) wrote :

The failing code is identical in Lucid and Maverick, so I'm thinking the difference between the two is that the Lucid fault occurs on a kernel page, which is fatal, while the Maverick failure occurs on a user page, which kills the process but not the kernel.

Scott Moser (smoser) on 2010-09-16
tags: added: ec2-images
tags: added: review-request
Kent Forschmiedt (kentf) wrote :

Still chewing on this. DomU should send Xen a kernel, readonly PMD entry, but what Xen gets is a read/write user entry. I am working on reconciling what DomU thinks it did with what Xen sees.

Thierry Carrez (ttx) on 2010-09-21
tags: added: server-mro
Matthias Klose (doko) wrote :

Re #3: the java -client -dump process is an optimization for the Hotspot client VM, only available on 32bit machines. Therefore you don't see it on amd64.

Is there any workaround for this at the moment?

Download full text (3.9 KiB)

Launch the instance as m1.small, then install java, and save as a different
ami.

From then, you can launch the saved ami as t1.micro.

I must say its an ugly workaround, but it works.

On Sun, Oct 10, 2010 at 5:45 AM, Branden Makana
<email address hidden>wrote:

> Is there any workaround for this at the moment?
>
> --
> t1.micro instance hangs when installing sun java
> https://bugs.launchpad.net/bugs/634487
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in “linux” package in Ubuntu: New
> Status in “linux-ec2” package in Ubuntu: Confirmed
> Status in “openjdk-6” package in Ubuntu: New
> Status in “sun-java6” package in Ubuntu: New
>
> Bug description:
> Binary package hint: cloud-init
>
> I booted the 32bit EBS lucid AMI (ami-1234de7b) for a t1.micro instance. I
> attempted to install Sun Java. The instance hung during the install. Repros
> every time. Only repros on t1.micro instances. I tried adding swap in case
> it was an out-of-memory condition, and it still repro'd. No reboots or
> anything else like that were involved so it's not the same issue as #634102
>
> Console log snippet (full one attached):
>
> [ 525.195499] ------------[ cut here ]------------
> [ 525.195515] kernel BUG at
> /build/buildd/linux-ec2-2.6.32/arch/x86/mm/hypervisor.c:461!
> [ 525.195522] invalid opcode: 0000 [#1] SMP
> [ 525.195527] last sysfs file: /sys/kernel/uevent_seqnum
> [ 525.195531] Modules linked in: ipv6
> [ 525.195537]
> [ 525.195541] Pid: 8663, comm: java Not tainted (2.6.32-308-ec2
> #15-Ubuntu)
> [ 525.195545] EIP: 0061:[<c0118550>] EFLAGS: 00010282 CPU: 0
> [ 525.195553] EIP is at T.566+0x150/0x180
> [ 525.195557] EAX: ffffffea EBX: c1f17e70 ECX: 00000002 EDX: 00000000
> [ 525.195561] ESI: 0d537000 EDI: 00000004 EBP: c1f17ebc ESP: c1f17e60
> [ 525.195565] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
> [ 525.195580] Process java (pid: 8663, ti=c1f16000 task=e5373200
> task.ti=c1f16000)
> [ 525.195588] Stack:
> [ 525.195591] c103a020 c1f17ee0 80000000 00000061 0000000e 00000000
> c1c81000 0d537061
> [ 525.195603] <0> 80000004 00000000 00000000 00000040 00000001 ffffffea
> c1f17ee0 00000001
> [ 525.195617] <0> 00000000 00007ff0 00000000 00000000 0d537001 00000004
> c14e5000 c1f17efc
> [ 525.195632] Call Trace:
> [ 525.195640] [<c01188dd>] ? xen_l3_entry_update+0x12d/0x1b0
> [ 525.195647] [<c01145de>] ? pud_populate+0x9e/0xc0
> [ 525.195654] [<c01aed59>] ? __pmd_alloc+0x99/0xa0
> [ 525.195659] [<c01b2eb9>] ? handle_mm_fault+0x509/0x5b0
> [ 525.195665] [<c01b7b2a>] ? do_munmap+0x22a/0x2b0
> [ 525.195672] [<c05370f9>] ? do_page_fault+0x119/0x340
> [ 525.195677] [<c0536fe0>] ? do_page_fault+0x0/0x340
> [ 525.195683] [<c0535525>] ? error_code+0x3d/0x44
> [ 525.195687] Code: c3 64 a0 cc 40 6f c0 84 c0 75 28 8b 5d a8 b9 01 00 00
> 00 31 d2 be f0 7f 00 00 e8 dc 8a fe ff 85 c0 79 d6 0f 0b eb fe 8d 74 26 00
> <0f> 0b eb fe 0f 0b eb fe 8b 45 a8 31 c9 ba 01 00 00 00 c7 04 24
> [ 525.195754] EIP: [<c0118550>] T.566+0x150/0x180 SS:ESP 0069:c1f17e60
> [ 525.195769] ---[ end trace 9706b235d81a7968 ]---
>
>
> Script which can 100% repro the problem:
>
> #!/bin/bash
>
> ex...

Read more...

@DanielDaniel, Branden,
  The easiest work around is
a.) start the instance as m1.small or m1.large (depending on if you want i386 or x86_64).
b.) install java
c.) stop the instance (ec2-stop-instances or /sbin/halt from inside the instance)
d.) modify the instance to be t1.micro (ec2-modify-instance-attribute --instance-type t1.micro <instance_id>)
e.) ec2-start-instances

Doing that means you don't have to register a new AMI, and will only cost time setting it up and 1 hour of m1.small or m1.large.

Paul Willis (info-paulwillis) wrote :

I believe you can install java fine on a t1.micro x86_64 this bug only affects the 32 bit version

Scott Moser (smoser) on 2010-10-18
Changed in linux (Ubuntu):
importance: Undecided → High
status: New → Confirmed

Thanks all for the workaround, I've tried it on several of my instances and it does work.

Davepar (dualrudder) wrote :

The workaround is only useful when you're setting up a new instance. It's useless for the instance I already invested a day setting up. It would be a really good idea to put a warning somewhere in the docs for micro instances. Once AWS starts handing out free micro instances by the thousands, the urgency of this bug will increase dramatically.

On Mon, 25 Oct 2010, Davepar wrote:

> The workaround is only useful when you're setting up a new instance.
> It's useless for the instance I already invested a day setting up. It
> would be a really good idea to put a warning somewhere in the docs for
> micro instances. Once AWS starts handing out free micro instances by the
> thousands, the urgency of this bug will increase dramatically.

Well, you actually can apply the work around to an existing instance. If
the instance is hung, you should be able to stop it (/sbin/halt). The
start it in a m1.small , 'apt-get -f install', then stop again, and
restart into t1.micro.
Additionally, its only an issue with i386 instances, and t1.micro can be
either amd64 or i386. amd64 are generally more useful as you have higher
powered options than you do with i386.

I'm not trying to discount the severity of the bug. It is a real kernel
bug that is exposed by java, and could also be exposed by another
application.

Davepar (dualrudder) wrote :
Download full text (5.0 KiB)

Thanks Scott. It didn't occur to me to try switching my instance to small,
do the install, and then switch back to micro. I'll give that a try.

Also, for some reason I thought the amd64 builds of Ubuntu were lacking in
some components. Maybe that's only true for client, but I definitely saw a
recommendation somewhere to use i386 unless there was a real need for > 4GB
of memory.

Dave

On Mon, Oct 25, 2010 at 11:25 AM, Scott Moser <email address hidden> wrote:

> On Mon, 25 Oct 2010, Davepar wrote:
>
> > The workaround is only useful when you're setting up a new instance.
> > It's useless for the instance I already invested a day setting up. It
> > would be a really good idea to put a warning somewhere in the docs for
> > micro instances. Once AWS starts handing out free micro instances by the
> > thousands, the urgency of this bug will increase dramatically.
>
> Well, you actually can apply the work around to an existing instance. If
> the instance is hung, you should be able to stop it (/sbin/halt). The
> start it in a m1.small , 'apt-get -f install', then stop again, and
> restart into t1.micro.
> Additionally, its only an issue with i386 instances, and t1.micro can be
> either amd64 or i386. amd64 are generally more useful as you have higher
> powered options than you do with i386.
>
> I'm not trying to discount the severity of the bug. It is a real kernel
> bug that is exposed by java, and could also be exposed by another
> application.
>
> --
> t1.micro instance hangs when installing sun java
> https://bugs.launchpad.net/bugs/634487
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in “linux” package in Ubuntu: Confirmed
> Status in “linux-ec2” package in Ubuntu: Confirmed
> Status in “openjdk-6” package in Ubuntu: New
> Status in “sun-java6” package in Ubuntu: New
>
> Bug description:
> Binary package hint: cloud-init
>
> I booted the 32bit EBS lucid AMI (ami-1234de7b) for a t1.micro instance. I
> attempted to install Sun Java. The instance hung during the install. Repros
> every time. Only repros on t1.micro instances. I tried adding swap in case
> it was an out-of-memory condition, and it still repro'd. No reboots or
> anything else like that were involved so it's not the same issue as #634102
>
> Console log snippet (full one attached):
>
> [ 525.195499] ------------[ cut here ]------------
> [ 525.195515] kernel BUG at
> /build/buildd/linux-ec2-2.6.32/arch/x86/mm/hypervisor.c:461!
> [ 525.195522] invalid opcode: 0000 [#1] SMP
> [ 525.195527] last sysfs file: /sys/kernel/uevent_seqnum
> [ 525.195531] Modules linked in: ipv6
> [ 525.195537]
> [ 525.195541] Pid: 8663, comm: java Not tainted (2.6.32-308-ec2
> #15-Ubuntu)
> [ 525.195545] EIP: 0061:[<c0118550>] EFLAGS: 00010282 CPU: 0
> [ 525.195553] EIP is at T.566+0x150/0x180
> [ 525.195557] EAX: ffffffea EBX: c1f17e70 ECX: 00000002 EDX: 00000000
> [ 525.195561] ESI: 0d537000 EDI: 00000004 EBP: c1f17ebc ESP: c1f17e60
> [ 525.195565] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
> [ 525.195580] Process java (pid: 8663, ti=c1f16000 task=e5373200
> task.ti=c1f16000)
> [ 525.195588] Stack:
> [ 525.195591] c103a020 c1f17ee...

Read more...

Im a new EC2 user and have just come across this.

How do I start a t.micro instance as m1.small and retain the root volume Im using in the t.micro instance ?

I have installed other apps and have configuration and now find I need to change from openjdk to sun-java6-jdk. I dont want to have to go thru it all again, but when I use ec2-run-instances I get a new volume, so cant install the jdk where I want it !

TIA

Scott Moser (smoser) wrote :

ultan,
  ec2-stop-instance ${IID}
  ec2-modify-instance-attribute --instance-type m1.small ${IID}
  ec2-start-instances ${IID}
  ssh to instance and do whatever you want
  ec2-stop-instance ${IID}
  ec2-modify-instance-attribute --instance-type t1.micro ${IID}
  ec2-start-instances ${IID}

Thierry (tbochud) wrote :

I confirm bug on maverick on i386 only. x86_64 running smoothly.

Tamas Herman (hermantamas) wrote :

Same thing on 32bit Lucid.

I tried to disable IPv6

sudo mv /lib/modules/2.6.32-308-ec2/kernel/net/ipv6{,.off}

because i saw references to it in the segfault message in the console log,
but it didnt make any change.

Mike Hobbs (mhobbs) wrote :

As an additional FYI, I've received this exact same error in a running instance. That is, it not only happens during a java install. Java had already been installed and running fine, and then this error occurred while running the Jetty app server.

Mike Hobbs (mhobbs) wrote :

Sorry, I forgot to mention it was running Ubuntu 10.04 (ami-480df921) on a t1.micro.

cjp (cjp618) wrote :

Also occurs when trying to run OpenDS (Java LDAP server) on maverick/i386/t1.micro.

genewitch (genewitch) wrote :

This also affects the JRE for java6 packages. reproducable.

crashlog http://paste.pocoo.org/show/309055/ (AMI console. the terminal is completely unresponsive, so typing /sbin/halt or any other such thing is a non-starter)

This also is NEARLY unsearchable, because it drops you into "developer resources" by default. yeah, i can find my way here, but i've been dealing with kernel bugs for 15 years. How is Average Joe going to find this page?

Tim Frosh (timfrosh) wrote :

Hi all,

I did what Daniel suggested:
"Launch the instance as m1.small, then install java, and save as a different
ami. From then, you can launch the saved ami as t1.micro. "

but after that I was unable to connect to newly launched instance. The error I get is connection refused.
I tried with the same security group, then also tried with a new security group w/o luck.

Anyone facing this behaviour? Please suggest something, I'm new to ubuntu and amazon.

Vlad (vladgh) wrote :

One downside I discovered by using this solution (stop the instance - make it a small - update java - stop it again - modify it back to micro) is that the new instances start with new private ips. That means that for the 5 servers I have I need to modify the nagios configs, firewalls, munin and so on. The elastic ip part is helping me a lot but not for the private pools and the communication between them.

Eric Hammond (esh) wrote :

Vlad: Off topic for the bug, but see: http://alestic.com/2009/06/ec2-elastic-ip-internal

Ed Anuff (ed-anuff) wrote :

Even with the workaround, the resulting java install appears somewhat unstable and will die with the same error at random times. Using 32bit Maverick on two different micro instances.

Scott Moser (smoser) wrote :

I've just done 2 things:
a.) verified this bug is present in natty at vmlinuz-2.6.38-2-virtual
b.) some task editing.

The task editing was mostly to just tag which releases this affects. Basically:
linux-ec2: lucid
linux-virtual: maverick, natty

Any other tasks can be closed as 'invalid' or anything. The kernel is really expected to be what is buggy, and people have reported java unstable-ness even after the stop-start-as-m1.small-stop-start-as-t1.micro workaround.

Scott Moser (smoser) on 2011-02-10
summary: - t1.micro instance hangs when installing sun java
+ t1.micro instance hangs when installing java
Peter Deak (peter-tpld) wrote :

Does anyone know how to recover from this issue temporarily? After requesting the install once, no matter what apt-get command I do, I get:

E: dpkg was interrupted, you must manually run 'sudo dpkg --configure -a' to correct the problem.

When I run that command, the hang occurs again and again. I cannot install or update anything else.

@Peter The workaround described in comment #13 will probably help you. Don't run 'sudo dpkg --configure -a' until the instance is started as a small.

Mike Hobbs (mhobbs) wrote :

@Peter If you only want to pay the cost of a micro instance, the workaround that worked best for me is to use a 64-bit Ubuntu image, which runs in a micro instance as well. For our apps, though, there is no sensitivity whether Java is running 64-bit or 32-bit, so YMMV. Running 64-bit also causes more memory pressure since a 64-bit app uses more memory.

Oleksandr Chugai (chugai) wrote :

I had to run 64-bit instance and run java with -XX:+UseCompressedOops to reduce memory overhead.

Michael Vogt (mvo) wrote :

Closing the java tasks (as per comment #31)

Changed in sun-java6 (Ubuntu Lucid):
status: New → Invalid
Changed in sun-java6 (Ubuntu Maverick):
status: New → Invalid
Changed in sun-java6 (Ubuntu Natty):
status: New → Invalid
Changed in linux-ec2 (Ubuntu Natty):
status: Confirmed → Invalid
Scott Moser (smoser) on 2011-02-25
Changed in openjdk-6 (Ubuntu Lucid):
status: New → Incomplete
status: Incomplete → Invalid
Changed in linux-ec2 (Ubuntu Maverick):
status: New → Invalid
Changed in openjdk-6 (Ubuntu Maverick):
status: New → Invalid
Changed in openjdk-6 (Ubuntu Natty):
status: New → Invalid
Andy Whitcroft (apw) on 2011-02-28
Changed in linux (Ubuntu Lucid):
status: New → Invalid
Scott Moser (smoser) on 2011-02-28
Changed in linux-ec2 (Ubuntu Lucid):
importance: Undecided → Medium
status: New → Confirmed
Brian (brian-bianco) wrote :

I'm surprised this isn't higher priority. This will kill an entire instance, and SUN JRE is useful if you plan on trying to use the RDS command line tools.

Changed in linux (Ubuntu Natty):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Yuval Adam (yuv-adm) wrote :

Until this bug is fixed, any help on how to fix a corrupted apt-get/dpkg state would be very helpful.
Basically, after trying to install the JRE, apt-get is unable to install any other package.

mr.b (mr.b) wrote :

I'd like to confirm that workaround "works". I say "works", because, while it allows dpkg to finish installing package, when instance type is reverted to t1.micro, instance is unable to run anything java-based.

mr.b (mr.b) wrote :

On how to fix corrupted apt-get/dpkg:

1. Run: dpkg -l |grep -v -E "^ii"

This will list all packages that are either not in perfectly installed state; either incompletely installed, or that have config files remaining behind them after they have been uninstalled (or some other variant). You can tell them apart by value in first column. (rc = removed & conf-files; iU = install & unpacked). See listing header ("dpkg -l | head" for detailed meanings).

2. Run: dpkg -P package_name [package_name2] [package_name3 ...]

This will purge those incomplete packages.

Personally, I have removed all listed packages (as they were all related to broken jre install), and dpkg was running without a problem afterwards.

mr.b (mr.b) wrote :

Also, if you wish to remove all listed packages, first you can run this, to see the list of packages-to-be-removed:

echo `dpkg --list | tail -n +6 | grep -v -E "^ii" | cut -d" " -f3`

After you are 100% sure that you know what exactly is going to be removed (so you don't get your system to even more corrupted state), then replace "echo" with "dpkg --purge".

Stefan Bader (smb) wrote :

I just want to add the results of some quick tests I did for Natty. The test (as it was described above) was to install openjdk-6-jdk (there is no sun package for Natty, yet). I tried on two us-east-1 t1.micro 32bit instances and one us-west-1 64bit instance, plus on a 32bit instance that used the same disk image and kernel (2.6.38-7.49) as ec2.

The ec2 results are exactly as they were:
32bit: hang with the xen_extend_mmu_update failing with rc=-22
64bit: success (as it does not call the same post-install code)

However my 32bit test system ran without any issues as well. Unfortunately there are two major differences:

EC2: Xen version: 3.1.2-128.1.10.el5 (preserve-AD), CPU Intel Xeon E5430
Local: Xen version: 3.4.3 (preserve-AD), CPU AMD Opteron 6128

So there could be two possible origins of the problem: the cpu specific mmu code or the hypervisor itself.

Stefan Bader (smb) wrote :

Ok, so same test environment moved to an Intel, i7 920 box and the openjdk install works as well. That seems to point strongly into some interaction issue with the hypervisor/dom0. Hm, I thought I went for CentOS 5.4 but apparently I ended up with 5.5 running a 2.6.18-194.32.1.el5xen kernel in dom0.

Stefan Bader (smb) wrote :

I repeated the Natty test with some extended debugging on another micro instance. The (annotated) output is below. As far as I understand it xen_set_pud() initiates a mmu_update multicall but by the time it gets flushed, there is still only one call present. This is a MMU_UPDATE hypervisor call. If I understand the description right, the lower two bits not set in ptr mean a normal PT update. The hypercall itself succeeds but the result is -EINVAL. There are many repetitions of this in the log. The ptr address seems to be the same in all cases, but val is always different.

 [ 178.878707] [<c0104c57>] ? xen_mc_flush+0x137/0x230
 [ 178.878710] [<c01068fd>] ? xen_set_pud_hyper+0x7d/0x80
 [ 178.878712] [<c0106947>] ? xen_set_pud+0x47/0x60
 [ 178.878715] [<c013a715>] ? pud_populate+0x45/0x90
 [ 178.878717] [<c020af53>] ? __pmd_alloc+0x73/0x90
 [ 178.878720] [<c020bbb4>] ? handle_mm_fault+0x174/0x190
 [ 178.878722] [<c0639b40>] ? do_page_fault+0x0/0x490
 [ 178.878725] [<c0639c9e>] ? do_page_fault+0x15e/0x490
 [ 178.878728] [<c0210133>] ? sys_mmap_pgoff+0x73/0x1d0
 [ 178.878731] [<c021050b>] ? sys_munmap+0x4b/0x60
 [ 178.878733] [<c0639b40>] ? do_page_fault+0x0/0x490
 [ 178.878736] [<c0636eaf>] ? error_code+0x67/0x6c
 [ 178.878739] call 1/1: op=1 result=-22 (-EINVAL) xen_extend_mmu_update+0x4a/0x70
 [ 178.878740] arg[0] = e620d960 (address of argument array)
 [ 178.878742] arg[1] = 1 (one pair of arguments)
 [ 178.878743] arg[2] = 0
 [ 178.878744] arg[3] = 7ff0 (DOMID_SELF)

ptr = 0x00000001 58b59008
val= 0x00000001 4c01f001

Stefan Bader (smb) wrote :

Now I know how I ended up with CentOS 5.5... yum upgrade is probably more than I intended to do. So at least that gives one more test: CentOS 5.5 + Xen 3.0.3 which seems to work as well with the current Natty test. Which would leave the dom0 kernel as the place a fix would need to go... Here 2.6.18-194.32.1.el5xen was used.

Angus Fox (angusf) wrote :

I have this exact issue with the bitnami cloud joomla image.

i686 GNU/Linux
Ubuntu 10.04.2 LTS
*** Welcome to the BitNami Joomla! Stack 1.5.23-0 ***
*** Please visit http://bitnami.org/faq/cloud_images

It fails installing the amazon ec2 tools, at the point of installing openjdk. Machine hangs, reboot from AWS console the only solution. Rebooted image unstable and in a partially installed state. I can't back it up to S3 without installing the tools, and I cant install the tools to back it up.

Pretty disappointing, given this is my first exploration of using EC2 for some light content management with the free instance and backing it up is all I need to do before going live and seeing how it goes.

Just to echo #37 above that this is surprisingly low priority given it has been around for months and will hit most free tier AWS users experimenting with the platform.

Tobias Kuhn (tkuhn) wrote :

I also think that this is a very severe bug. Can it be expected to be fixed for the Natty release?

Changed in linux (Ubuntu Natty):
milestone: none → natty-updates
Andy Whitcroft (apw) on 2011-04-27
Changed in ubuntu-release-notes:
assignee: nobody → Andy Whitcroft (apw)
status: New → In Progress
Andy Whitcroft (apw) wrote :

Natty release note text added:

  "Installing Java in a 32 bit t1.micro instance will hang, as a work around you may install Java in a t1.small instance and then resize the instance (see comment #13 for details). (634487)"

Changed in ubuntu-release-notes:
status: In Progress → Fix Released

Please tell me this is not intended to be a the fix @Andy... If you have been following the discussion you will know that the process of installing on a different size instance and resizing to a micro will cause random instability and crashes.

The method of changing the instance size should only be used to *uninstall java* once it has caused your instance to hang.

Can you please describe what the "Fix" will be for this issue? If it is a kernel update can you tell me when this fix will be backported to Lucid.

I second @Andrew Manson: the workaround is just that, it is not a fix. It's also been known for 6 months.

Gabriel Nell (gabriel-nell) wrote :

I hope that since it remains as "confirmed" in linux-ec2 and "new/confirmed" in linux that the Ubuntu team is still following up with the root cause, and the release notes being updated doesn't de-prioritize fixing the root issue. Is this understanding correct?

Scott Moser (smoser) on 2011-04-28
Changed in linux (Ubuntu Maverick):
status: New → Confirmed
Matt Wilson (msw-amazon) wrote :

I think that the root cause is a corrupted p2m_host[] list via a PV-GRUB bug. Updated PV-GRUB AKIs are now available. These can be used in us-east-1 to verify the fix:

32-bit: aki-805ea7e9
64-bit: aki-825ea7eb

Wolfgang Nagele (mail-wnagele) wrote :

Does not seem to fix the problem. Tested using Maverick (ami-ccf405a5) and aki-805ea7e9 it still triggers a 100% CPU state when installing the Sun JDK.

Ben Howard (utlemming) wrote :

The stack trace is different though:

Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935506] ------------[ cut here ]------------
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935508] WARNING: at /build/buildd/linux-2.6.38/arch/x86/xen/multicalls.c:182 xen_mc_flush+0x1a8/0x1b0()
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935510] Modules linked in: acpiphp
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935513] Pid: 1579, comm: java Tainted: G W 2.6.38-8-virtual #42-Ubuntu
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935514] Call Trace:
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935517] [<c0158b52>] ? warn_slowpath_common+0x72/0xa0
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935519] [<c0104cc8>] ? xen_mc_flush+0x1a8/0x1b0
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935522] [<c0104cc8>] ? xen_mc_flush+0x1a8/0x1b0
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935524] [<c0158ba2>] ? warn_slowpath_null+0x22/0x30
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935527] [<c0104cc8>] ? xen_mc_flush+0x1a8/0x1b0
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935529] [<c010538a>] ? xen_extend_mmu_update+0x4a/0x70
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935532] [<c010687d>] ? xen_set_pud_hyper+0x7d/0x80
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935534] [<c01068c7>] ? xen_set_pud+0x47/0x60
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935537] [<c013a695>] ? pud_populate+0x45/0x60
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935539] [<c020aff3>] ? __pmd_alloc+0x73/0x90
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935542] [<c020bc54>] ? handle_mm_fault+0x174/0x190
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935544] [<c0639d00>] ? do_page_fault+0x0/0x490
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935547] [<c0639e5e>] ? do_page_fault+0x15e/0x490
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935549] [<c02101d3>] ? sys_mmap_pgoff+0x73/0x1d0
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935552] [<c02105ab>] ? sys_munmap+0x4b/0x60
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935554] [<c0639d00>] ? do_page_fault+0x0/0x490
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935557] [<c063706f>] ? error_code+0x67/0x6c
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935558] ---[ end trace b828e3a0dda720b7 ]---
Jun 21 17:35:36 ip-10-244-167-3 kernel: [ 110.935567] 1 multicall(s) failed: cpu 0

Scott Moser (smoser) wrote :

I've moved the oneiric builds to using the new AKIs. If we find that we can fix this issue by some combination of kernel changes and pv-grub changes we can then apply that to the defaults for stable releases.

tomorrows' oneiric builds should use these new akis.

Jonathan Wolter (jawolter) wrote :

If this is useful in diagnosing the problem or solution, I tried using nice when running the install, and it worked 2 times. Failed 2 times. The last time I added the --adjustment=19, and it succeeded. Is it an IO related problem? Here's my full script.

#!/bin/bash
sudo add-apt-repository "deb http://archive.canonical.com/ natty partner"
sudo apt-get update
#Accept the Java license.
for i in bin jdk jre; do
  echo "sun-java6-$i shared/accepted-sun-dlj-v1-1 select true" | sudo debconf-set-selections
done
# convoluted way to install java. this seems to only work some times! Race condition?
# https://forums.aws.amazon.com/message.jspa?messageID=199841#199841
sudo nice --adjustment=19 apt-get install -y sun-java6-jre

Stefan Bader (smb) wrote :

Finally was able to recreate this behaviour locally. Using CentOS 5.3 as dom0, then I took a current oneiric ec2 image for domU, started a javajdk-6-jdk install and got the endless look of multicall failed messages in domU. Meanwhile I could see in dom0 that there was a matching stream of "mm.c:694:* Bad L3 flags 6" messages (6 == _PAGE_RW | _PAGE_USER). Using exactly the same domU image in a CentOS 5,4 and later dom0 works without problems.

Though I could not yet find out what exactly the change is between 5.3 and 5.4 that fixes the issue.

Stefan Bader (smb) wrote :

At last I think I can point at one single patch which would need to get applied to the kernel running as dom0 on Amazon EC2 hosts:

http://xenbits.xen.org/hg/staging/xen-3.1-testing.hg/rev/f1574ad9f702

I took the CentOS5.3 kernel, compiled it as a base for the comparison. Installed it as dom0 kernel and started an Oneiric domU with similar setup as t1.micro has. 3 out of 3 test runs would cause the multicall failure to show up.
Next I applied the patch above (though I took the version from CentOS 5.4 source but it looks identical) and added it to the build. Then booted into the new dom0 kernel and now all three tests would successfully install the java-jdk.

Ben Howard (utlemming) wrote :

Sent the patch to Amazon for consideration and comment.

I run into this with: apt-get install ec2-api-tools
You may want to let Amazon know. It's a bad user experience to hang at the EC2 tools installation.
Until I saw this bug report, I thought this was an example of the t1.micro instance CPU throttling:
http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/index.html?concepts_micro_instances.html

Kambiz Darabi (darabi) wrote :

I had the same problem on a hardy amd64 hardy dom0 running xen 3.2 and kernel 2.6.24-21-xen. When I tried to install sun-java6-jdk on an i386 lucid domU.

I can confirm that the xen patch mentioned by Stefan in comment 58 solved my problem.

Before patching, the installation of sun-java6-bin crashed reliably, and calling 'dpkg --configure -a' showed this:

-----
Setting up sun-java6-bin (6.21dlj-0ubuntu1~lucid1~ppa1) ...
update-alternatives: using /usr/lib/jvm/java-6-sun/jre/bin/ControlPanel to provide /usr/bin/ControlPanel (ControlPanel) in auto mode.
...
update-alternatives: using /usr/lib/jvm/java-6-sun/jre/lib/jexec to provide /usr/bin/jexec (jexec) in auto mode.
[ 51.946179] ------------[ cut here ]------------
[ 51.946188] kernel BUG at /build/buildd/linux-ec2-2.6.32/arch/x86/mm/hypervisor.c:461!
[ 51.946192] invalid opcode: 0000 [#1] SMP
[ 51.946197] last sysfs file: /sys/kernel/uevent_seqnum
[ 51.946200] Modules linked in: ipv6
[ 51.946204]
[ 51.946208] Pid: 836, comm: java Not tainted (2.6.32-317-ec2 #36-Ubuntu)
[ 51.946211] EIP: 0061:[<c01184c0>] EFLAGS: 00010282 CPU: 0
[ 51.946217] EIP is at T.571+0x150/0x180
[ 51.946219] EAX: ffffffea EBX: e1dc3e70 ECX: 00000002 EDX: 00000000
[ 51.946222] ESI: b70d8000 EDI: 00000003 EBP: e1dc3ebc ESP: e1dc3e60
[ 51.946225] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
[ 51.946228] Process java (pid: 836, ti=e1dc2000 task=e19f72f0 task.ti=e1dc2000)
[ 51.946231] Stack:
[ 51.946233] c1df9960 e1dc3ee0 80000000 00000061 0000000e 00000000 e1a4b000 b70d8061
[ 51.946243] <0> 80000003 00000000 00000000 00000040 00000001 ffffffea e1dc3ee0 00000001
[ 51.946253] <0> 00000000 00007ff0 00000000 00000000 b70d8001 00000003 c233a000 e1dc3efc
[ 51.946265] Call Trace:
[ 51.946271] [<c011884d>] ? xen_l3_entry_update+0x12d/0x1b0
[ 51.946276] [<c011454e>] ? pud_populate+0x9e/0xc0
[ 51.946281] [<c01af189>] ? __pmd_alloc+0x99/0xa0
[ 51.946285] [<c01b35b9>] ? handle_mm_fault+0x509/0x5b0
[ 51.946290] [<c01b833a>] ? do_munmap+0x23a/0x2d0
[ 51.946295] [<c0537d59>] ? do_page_fault+0x119/0x340
[ 51.946300] [<c0537c40>] ? do_page_fault+0x0/0x340
[ 51.946304] [<c0536185>] ? error_code+0x3d/0x44
----

I applied the patch to the sources of xen-hypervisor-3.2_3.2.0_amd64.deb, rebooted with the patched hypervisor and the old dom0 kernel (2.6.24-21-xen).

The dpkg --configure step didn't crash the domU, and I can run tomcat6. Up to now, there has been no heavy load on the instance. I'll report back if I experience crashes.

Thank you, Stefan.

Ed Anuff (ed-anuff) wrote :

Does anyone know if Amazon has incorporated the patch? Is this still an open bug?

Eugen Paraschiv (hanriseldon) wrote :

We are also having the same problem with a micro machine. Any news on this?
Thank you.

Adam (crudbug) wrote :

We also seeing this Issue : [Command Log]

Updating category cmap..

Setting up openjdk-6-jre-lib (6b20-1.9.9-0ubuntu1~10.04.2) ...

Setting up openjdk-6-jre-headless (6b20-1.9.9-0ubuntu1~10.04.2) ...
update-alternatives: using /usr/lib/jvm/java-6-openjdk/jre/bin/java to provide /usr/bin/java (java) in auto mode.
update-alternatives: using /usr/lib/jvm/java-6-openjdk/jre/bin/keytool to provide /usr/bin/keytool (keytool) in auto mode.
update-alternatives: using /usr/lib/jvm/java-6-openjdk/jre/bin/pack200 to provide /usr/bin/pack200 (pack200) in auto mode.
update-alternatives: using /usr/lib/jvm/java-6-openjdk/jre/bin/rmid to provide /usr/bin/rmid (rmid) in auto mode.
update-alternatives: using /usr/lib/jvm/java-6-openjdk/jre/bin/rmiregistry to provide /usr/bin/rmiregistry (rmiregistry) in auto mode.
update-alternatives: using /usr/lib/jvm/java-6-openjdk/jre/bin/unpack200 to provide /usr/bin/unpack200 (unpack200) in auto mode.
update-alternatives: using /usr/lib/jvm/java-6-openjdk/jre/bin/orbd to provide /usr/bin/orbd (orbd) in auto mode.
update-alternatives: using /usr/lib/jvm/java-6-openjdk/jre/bin/servertool to provide /usr/bin/servertool (servertool) in auto mode.
update-alternatives: using /usr/lib/jvm/java-6-openjdk/jre/bin/tnameserv to provide /usr/bin/tnameserv (tnameserv) in auto mode.
update-alternatives: using /usr/lib/jvm/java-6-openjdk/jre/lib/jexec to provide /usr/bin/jexec (jexec) in auto mode.

[Hangs here]

Any Updates ?

Ben Howard (utlemming) wrote :

We have investigated the problem and can confirm that the bug lies in the underlying EC2 hypervisor; this bug cannot be fixed at the AMI level. The problem only affects launches on 32-bit t1.micro instances.

Canonical has been in contact with and remains in contact with Amazon regarding a final resolution of this issue. As more information becomes available, Canonical will share it with the community.

We recommend that users who need Java on 32-bit t1.micros launch a m1.small, install Java and then change the instance type to a t1.micro. This can be done by doing stopping the instance and modifying the instance type using either the AWS console or using the command line tools. Users who have existing 32-bit t1.micros and need Java use the same method to change their instance(s) to m1.smalls, install Java and then switch back to t1.micro.

However, we have had some reports that the work around may not work for some users of 32-bit t1.micros. Users who use the work-around to install Java may hit the hyper-visor bug running Java in production. Users who hit the bug after employing the work around, should consider using a m1.small.

We are not aware of a specific time-line to permanently resolve the underlying hypervisor issue at this time. Users that need more specific information about this issue should contact Amazon Web Services directly.

Ben Howard (utlemming) wrote :

On 2011-11-08, Amazon launched the US-West-2 availability region. ( http://aws.typepad.com/aws/2011/11/now-open-us-west-portland-region.html )

This region appears to NOT be affected by the bug. If you are experiencing this bug and are not bound to a particular EC2 region, please try the US-West-2 region as a work around.

Download full text (4.9 KiB)

"US-West-2 availability region" I suggest is a misnomer.
There are availability zones us-west-2a and us-west-2b in the us-west-2 region.

On 9 November 2011 15:33, Ben Howard <email address hidden> wrote:
> On 2011-11-08, Amazon launched the US-West-2 availability region.  (
> http://aws.typepad.com/aws/2011/11/now-open-us-west-portland-region.html
> )
>
> This region appears to NOT be affected by the bug. If you are
> experiencing this bug and are not bound to a particular EC2 region,
> please try the US-West-2 region as a work around.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/634487
>
> Title:
>  t1.micro instance hangs when installing java
>
> Status in Release Notes for Ubuntu:
>  Fix Released
> Status in “linux” package in Ubuntu:
>  Confirmed
> Status in “linux-ec2” package in Ubuntu:
>  Invalid
> Status in “openjdk-6” package in Ubuntu:
>  Invalid
> Status in “sun-java6” package in Ubuntu:
>  Invalid
> Status in “linux” source package in Lucid:
>  Invalid
> Status in “linux-ec2” source package in Lucid:
>  Confirmed
> Status in “openjdk-6” source package in Lucid:
>  Invalid
> Status in “sun-java6” source package in Lucid:
>  Invalid
> Status in “linux” source package in Maverick:
>  Confirmed
> Status in “linux-ec2” source package in Maverick:
>  Invalid
> Status in “openjdk-6” source package in Maverick:
>  Invalid
> Status in “sun-java6” source package in Maverick:
>  Invalid
> Status in “linux” source package in Natty:
>  Confirmed
> Status in “linux-ec2” source package in Natty:
>  Invalid
> Status in “openjdk-6” source package in Natty:
>  Invalid
> Status in “sun-java6” source package in Natty:
>  Invalid
>
> Bug description:
>  Binary package hint: cloud-init
>
>  I booted the 32bit EBS lucid AMI (ami-1234de7b) for a t1.micro
>  instance. I attempted to install Sun Java. The instance hung during
>  the install. Repros every time. Only repros on t1.micro instances. I
>  tried adding swap in case it was an out-of-memory condition, and it
>  still repro'd. No reboots or anything else like that were involved so
>  it's not the same issue as #634102
>
>  Console log snippet (full one attached):
>
>  [  525.195499] ------------[ cut here ]------------
>  [  525.195515] kernel BUG at /build/buildd/linux-ec2-2.6.32/arch/x86/mm/hypervisor.c:461!
>  [  525.195522] invalid opcode: 0000 [#1] SMP
>  [  525.195527] last sysfs file: /sys/kernel/uevent_seqnum
>  [  525.195531] Modules linked in: ipv6
>  [  525.195537]
>  [  525.195541] Pid: 8663, comm: java Not tainted (2.6.32-308-ec2 #15-Ubuntu)
>  [  525.195545] EIP: 0061:[<c0118550>] EFLAGS: 00010282 CPU: 0
>  [  525.195553] EIP is at T.566+0x150/0x180
>  [  525.195557] EAX: ffffffea EBX: c1f17e70 ECX: 00000002 EDX: 00000000
>  [  525.195561] ESI: 0d537000 EDI: 00000004 EBP: c1f17ebc ESP: c1f17e60
>  [  525.195565]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
>  [  525.195580] Process java (pid: 8663, ti=c1f16000 task=e5373200 task.ti=c1f16000)
>  [  525.195588] Stack:
>  [  525.195591]  c103a020 c1f17ee0 80000000 00000061 0000000e 00000000 c1c81000 0d537061
>  [  525.195603] <0> 80000004 00000000 0000...

Read more...

Daniel Sikar (dsikar) wrote :
Download full text (5.2 KiB)

PS Thanks for the heads up.

On 9 November 2011 15:50, Daniel Sikar <email address hidden> wrote:
> "US-West-2 availability region" I suggest is a misnomer.
> There are availability zones us-west-2a and us-west-2b in the us-west-2 region.
>
> On 9 November 2011 15:33, Ben Howard <email address hidden> wrote:
>> On 2011-11-08, Amazon launched the US-West-2 availability region.  (
>> http://aws.typepad.com/aws/2011/11/now-open-us-west-portland-region.html
>> )
>>
>> This region appears to NOT be affected by the bug. If you are
>> experiencing this bug and are not bound to a particular EC2 region,
>> please try the US-West-2 region as a work around.
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/634487
>>
>> Title:
>>  t1.micro instance hangs when installing java
>>
>> Status in Release Notes for Ubuntu:
>>  Fix Released
>> Status in “linux” package in Ubuntu:
>>  Confirmed
>> Status in “linux-ec2” package in Ubuntu:
>>  Invalid
>> Status in “openjdk-6” package in Ubuntu:
>>  Invalid
>> Status in “sun-java6” package in Ubuntu:
>>  Invalid
>> Status in “linux” source package in Lucid:
>>  Invalid
>> Status in “linux-ec2” source package in Lucid:
>>  Confirmed
>> Status in “openjdk-6” source package in Lucid:
>>  Invalid
>> Status in “sun-java6” source package in Lucid:
>>  Invalid
>> Status in “linux” source package in Maverick:
>>  Confirmed
>> Status in “linux-ec2” source package in Maverick:
>>  Invalid
>> Status in “openjdk-6” source package in Maverick:
>>  Invalid
>> Status in “sun-java6” source package in Maverick:
>>  Invalid
>> Status in “linux” source package in Natty:
>>  Confirmed
>> Status in “linux-ec2” source package in Natty:
>>  Invalid
>> Status in “openjdk-6” source package in Natty:
>>  Invalid
>> Status in “sun-java6” source package in Natty:
>>  Invalid
>>
>> Bug description:
>>  Binary package hint: cloud-init
>>
>>  I booted the 32bit EBS lucid AMI (ami-1234de7b) for a t1.micro
>>  instance. I attempted to install Sun Java. The instance hung during
>>  the install. Repros every time. Only repros on t1.micro instances. I
>>  tried adding swap in case it was an out-of-memory condition, and it
>>  still repro'd. No reboots or anything else like that were involved so
>>  it's not the same issue as #634102
>>
>>  Console log snippet (full one attached):
>>
>>  [  525.195499] ------------[ cut here ]------------
>>  [  525.195515] kernel BUG at /build/buildd/linux-ec2-2.6.32/arch/x86/mm/hypervisor.c:461!
>>  [  525.195522] invalid opcode: 0000 [#1] SMP
>>  [  525.195527] last sysfs file: /sys/kernel/uevent_seqnum
>>  [  525.195531] Modules linked in: ipv6
>>  [  525.195537]
>>  [  525.195541] Pid: 8663, comm: java Not tainted (2.6.32-308-ec2 #15-Ubuntu)
>>  [  525.195545] EIP: 0061:[<c0118550>] EFLAGS: 00010282 CPU: 0
>>  [  525.195553] EIP is at T.566+0x150/0x180
>>  [  525.195557] EAX: ffffffea EBX: c1f17e70 ECX: 00000002 EDX: 00000000
>>  [  525.195561] ESI: 0d537000 EDI: 00000004 EBP: c1f17ebc ESP: c1f17e60
>>  [  525.195565]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
>>  [  525.195580] Process java (pid: 8663, ti=c1f16000 task=e5373200 t...

Read more...

Ben Howard (utlemming) wrote :

Amazon has verified that this issue does not exist in the US-West-2 region and has suggested that users who are affected should consider using US-West-2 as a viable and easy work around. US-West-2 has the same pricing as US-East-1 for the t1.micro,.

Gabriel Nell (gabriel-nell) wrote :

Ben, do you have a link to Amazon confirming this issue being fixed in US-WEST-2? Is it on the AWS developer forums? I would like to follow up to see if they could then fix it in other regions :)

Jeff Bauer (jbauer) wrote :

Just confirming that it's working for my instances launched in us-west-2.

I've experienced this exact issue on t1.micro instances in eu-west-1 running erlang and postgresql. Java isn't even installed. Does Amazon have a publicly accessible page tracking the issue and the rollout of the fix to regions other than us-west-2?

Ben Howard (utlemming) wrote :

Re comment#72, can you file a new bug for erlang and postgresql?

Clint Byrum (clint-fewbar) wrote :

I believe that the erlang issue is something different entirely, so would also like to see another bug report filed. I have experienced some lockup/CPU drag oddness with erlang on c1.xlarge's, not just t1.micro.. but haven't had the time to debug it just yet. Matt, please post back here with the bug # you file, or I will do so if I get around to it soon.

In reply to comments #73 and #74, I believe my erlang issue to be identical to the java issue in this report because the kernel oops is identical. Do you have the same oops or a different one? Is there any benefit to filing this against erlang and postgresql now the underlying hypervisor issue has been found/fixed? In principle practically any package could provoke this oops.

Does anyone have a pointer to the corresponding Amazon issue?

Scott Moser (smoser) wrote :

Matt, I suspect you are correct that this is the same issue hit another way.
I also suspect that if we had a tiny C test case that illustrated the issue (which I've never gone after) we'd have a much better likelyhood of finding a guest or host level work around. However, I think most parties involved have just accepted that the hypervisor level fix is what is necessary, and that due to it only (largely) affecting java on t1.micro (which is not a terribly useful scenario for cpu anda memory issues).

There is no public amazon issue that I am aware of.

Torne Wuff (torne) wrote :

I have this same issue running node.js on a t1.micro in eu-west-1a, identical kernel oops as Matt notes for erlang/postgresql. Every invocation of node immediately goes into this state. :/

Arne de Bree (ak-arne) wrote :

Same experience here, please be aware that the zone naming is specific for an account and not globally. So if you see eu-west-1a that can for me be 1b or 1c. There is not consistency between accounts for that.

How I worked around this is that I've started 3 instances and found out that in only one of the 3 EU zones allowed me to install and run npm. In the one zone that I got a working installation I was able to start multiple t1.micro nodes and install and run node on it. And on the other 2 zones even new instances gave me the same, non-working result.

So I guess it has to do with something specific to the zone.

dino99 (9d9) wrote :
Changed in linux (Ubuntu Natty):
status: Confirmed → Invalid
Changed in linux (Ubuntu Maverick):
status: Confirmed → Invalid

Closing only the linux (Ubuntu) task as Java was removed from the Ubuntu repositories due to Oracle dissolving the Operating System Distributor License for Java [1][2], Oracle has sunsetted Java 6 for the general public [3], and this issue being reproducible in linux-ec2.

[1] http://robilad.livejournal.com/90792.html
[2] https://java.net/projects/jdk-distros/
[3] http://www.oracle.com/technetwork/java/eol-135779.html

Changed in linux (Ubuntu):
assignee: Canonical Kernel Team (canonical-kernel-team) → nobody
importance: High → Undecided
milestone: natty-updates → none
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers