m1.large instances randomly freezing for 5-15 minutes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-ec2 (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
I noticed a strange behavior since we started running m1.large instances with AMI ami-fa01f193.
From time to time (1 to 3 times a day), the instance will become unresponsive for 3 to 15 minutes. Our application, running on the instance will not respond to requests anymore. However, some low-profile processes (such as collectd) will keep running and we continue getting statistics.
Here is a descriptive
* System is running normally (load level is not a factor, we've had it happen on non-busy servers)
* Suddenly, one of the CPUs becomes stuck to 100% with a large proportion of system cpu time (see attached capture from collectd).
* Applications become totally unresponsive.
* Network is *not* totally stopped (since we keep receiving collectd statistics).
How to repeat bug: No deterministic way, just wait.
I only tested the problem on the us-east-1 region but it occurs on instances from all zones.
We have different software stacks (python, java) and all have been affected.
I tried to run other instances on an AMI we trusted for long (ami-da0cf8b3 2.6.32-309-ec2), and so far, for more than 24 hours, the issue did not show while it continues showing on.
I am currently testing kernel 2.6.32-314-ec2 to see if it causes the same behavior.
ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-ec2 2.6.32.312.13
ProcVersionSign
Uname: Linux 2.6.32-312-ec2 x86_64
Architecture: amd64
Date: Wed Mar 23 14:06:15 2011
Ec2AMI: ami-fa01f193
Ec2AMIManifest: ubuntu-
Ec2Availability
Ec2InstanceType: m1.large
Ec2Kernel: aki-427d952b
Ec2Ramdisk: unavailable
ProcEnviron:
LANG=en_US.UTF-8
SHELL=/bin/bash
SourcePackage: linux-meta-ec2
Changed in linux-ec2 (Ubuntu): | |
status: | New → Confirmed |
I've been experiencing the same systems, though I believe it's related to Java. We're using sun-java6, 2.6.32-305-ec2 (and also 2.6.32-314-ec2), Ubuntu 10.04.1 x86_64. The problem seems top be exacerbated under load (e.g. when running Hadoop jobs).
This (http://<email address hidden> /msg08703. html) seems related.