Ubuntu
nova package

a bad AMI can hang an entire compute node

Bug #960276 reported by Nick Moffitt on 2012-03-20

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	nova (Ubuntu)	Fix Released	Critical	Unassigned

Bug Description

Using the attached image (and others) causes the entire compute node to hang between the booting of the image and the configuration of networking. The running image has a console ring buffer output file (however problematic--often it looks like it never got a proper root filesystem somehow--lots of "NO PTY" errors), but is unpingable.

The only way to terminate these instances is to restart nova-compute so that it will collect amqp messages again, and then send the terminate request. This seems suspiciously like the compute code is blocking in a libvirt call of some sort.

The cluster used booted an older Oneiric image with no problems whatsoever.

This effectively can DoS an entire openstack installation through nothing more than running instances.

Attached is the amd64 image from http://cloud-images.ubuntu.com/precise/20120319/ which exhibited this problem in our rc1 cloud.

Tags:

Revision history for this message

Nick Moffitt (nick-moffitt) wrote on 2012-03-20:

19 March 2012 Precise amd64 cloud-images.ubuntu.com image Edit (210.9 MiB, application/x-tar)

James Troup (elmo) on 2012-03-20

tags:

added: canonistack

Revision history for this message

Nick Moffitt (nick-moffitt) wrote on 2012-03-20:

This does not cause libvirtd to hang, by the way. "sudo virsh list" does fine, and I'm able to kill instances manually with virsh destroy.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2012-03-20:

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nova (Ubuntu):
status:	New → Confirmed

Adam Gandelman (gandelman-a) on 2012-03-20

Changed in nova (Ubuntu):
importance:	Undecided → Critical

Revision history for this message

Adam Gandelman (gandelman-a) wrote on 2012-03-20:

We've been carrying a nova patch to resolve a possible DoS in Bug #832507 ( libvirt-use-console-pipe.patch ) I've confirmed that this patch introduces a deadlock somewhere when the serial console gets spammed. 'dd if=/dev/urandom of=/dev/ttyS0 bs=1024 count=1500' from within the instance is enough to basically lock nova-compute until the KVM process is killed or nova-compute restarted.

We either need to fix this patch ASAP or back it out in favor of a different solution for the original Bug #832507. This patch constitutes the biggest delta we maintain across any Openstack component and maintaining it so far has required a great deal of effort. The regression its introduces is worse than the original bug, IMO.

Adam Gandelman (gandelman-a) on 2012-03-27

Changed in nova (Ubuntu):
status:	Confirmed → Triaged

Chuck Short (zulcss) on 2012-03-30

Changed in nova (Ubuntu):
status:	Triaged → Fix Committed

Chuck Short (zulcss) on 2012-04-23

Changed in nova (Ubuntu):
status:	Fix Committed → Fix Released

Revision history for this message

James Page (james-page) wrote on 2012-04-23:

Fixed in 2012.1~rc2-0ubuntu1

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

19 March 2012 Precise amd64 cloud-images.ubuntu.com image Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntunova package

a bad AMI can hang an entire compute node

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
nova package