Booting up more than one instance on a fresh image fails

Bug #1014227 reported by Sam Morrison
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Michael Still

Bug Description

If I create a new image in glance then boot up >1 instances for the image all but 1 fail.
Guessing this has something to do with the image downloading and resizing.

Revision history for this message
Thierry Carrez (ttx) wrote :

Do you have logs pointing to the nature of the failure, so that we can confirm your guess ?

Changed in nova:
status: New → Incomplete
Revision history for this message
Sam Morrison (sorrison) wrote :
Download full text (5.7 KiB)

OK I have tracked down the issue.

This only happens when you are using NFS for /var/lib/nova/instances/

It will work on the first node that is scheduled to build an instance but not on the second.

Error is: ( this is using the latest version of nova in Ubuntu Precise)

2012-06-19 01:24:31 ERROR nova.rpc.amqp [req-3aac4a94-399f-4b8e-828c-2eaa205cb131 c57d404de8404a0fa238daff8c137960 a43f0f85e1384683a388b974145589c6] Exception during message handling
2012-06-19 01:24:31 TRACE nova.rpc.amqp Traceback (most recent call last):
2012-06-19 01:24:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 252, in _process_data
2012-06-19 01:24:31 TRACE nova.rpc.amqp rval = node_func(context=ctxt, **node_args)
2012-06-19 01:24:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 114, in wrapped
2012-06-19 01:24:31 TRACE nova.rpc.amqp return f(*args, **kw)
2012-06-19 01:24:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 177, in decorated_function
2012-06-19 01:24:31 TRACE nova.rpc.amqp sys.exc_info())
2012-06-19 01:24:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2012-06-19 01:24:31 TRACE nova.rpc.amqp self.gen.next()
2012-06-19 01:24:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 171, in decorated_function
2012-06-19 01:24:31 TRACE nova.rpc.amqp return function(self, context, instance_uuid, *args, **kwargs)
2012-06-19 01:24:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 651, in run_instance
2012-06-19 01:24:31 TRACE nova.rpc.amqp do_run_instance()
2012-06-19 01:24:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 945, in inner
2012-06-19 01:24:31 TRACE nova.rpc.amqp retval = f(*args, **kwargs)
2012-06-19 01:24:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 650, in do_run_instance
2012-06-19 01:24:31 TRACE nova.rpc.amqp self._run_instance(context, instance_uuid, **kwargs)
2012-06-19 01:24:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 451, in _run_instance
2012-06-19 01:24:31 TRACE nova.rpc.amqp self._set_instance_error_state(context, instance_uuid)
2012-06-19 01:24:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2012-06-19 01:24:31 TRACE nova.rpc.amqp self.gen.next()
2012-06-19 01:24:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 432, in _run_instance
2012-06-19 01:24:31 TRACE nova.rpc.amqp self._deallocate_network(context, instance)
2012-06-19 01:24:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2012-06-19 01:24:31 TRACE nova.rpc.amqp self.gen.next()
2012-06-19 01:24:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 429, in _run_instance
2012-06-19 01:24:31 TRACE nova.rpc.amqp injected_files, admin_password)
2012-06-19 01:24:31 TRACE nova.rpc.amqp File "...

Read more...

Changed in nova:
status: Incomplete → New
Michael Still (mikal)
Changed in nova:
assignee: nobody → Michael Still (mikalstill)
Michael Still (mikal)
Changed in nova:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Michael Still (mikal) wrote :

Sorry for the delay on this one. I have what I believe is a fix queued up, but I need some dependant code to be merged first. I've sent that off for review and it shouldn't be too long.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/12024

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/12024
Committed: http://github.com/openstack/nova/commit/1523fab5ee465df096b2c76d27b634c1f52aca77
Submitter: Jenkins
Branch: master

commit 1523fab5ee465df096b2c76d27b634c1f52aca77
Author: Michael Still <email address hidden>
Date: Thu Aug 23 22:43:06 2012 +1000

    External locking for image caching.

    If the instance storage is shared between compute nodes, then you
    need external locking which is also shared to avoid clobbering each
    other's attempts to cache base images. Resolves bug 1014227.

    Change-Id: Ic2ac87840904fa199c17774dae9556ad6c7a3eaf

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → folsom-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: folsom-rc1 → 2012.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.