stable/newton: unable to spin up a kvm instance

Bug #1630912 reported by Gary Kotton
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned

Bug Description

Happens randowmly:

2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [req-50d2dc6d-072e-423e-9de0-096ba307411f admin demo] [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] Instance failed to spawn
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] Traceback (most recent call last):
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/opt/stack/nova/nova/compute/manager.py", line 2078, in _build_resources
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] yield resources
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/opt/stack/nova/nova/compute/manager.py", line 1920, in _build_and_run_instance
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] block_device_info=block_device_info)
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2571, in spawn
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] admin_pass=admin_password)
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2960, in _create_image
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] image_id=disk_images['kernel_id'])
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/opt/stack/nova/nova/virt/libvirt/imagebackend.py", line 218, in cache
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] *args, **kwargs)
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/opt/stack/nova/nova/virt/libvirt/imagebackend.py", line 504, in create_image
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] prepare_template(target=base, *args, **kwargs)
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 264, in inner
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] do_log=False, semaphores=semaphores, delay=delay):
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] return self.gen.next()
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 216, in lock
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] ext_lock.acquire(delay=delay)
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/usr/local/lib/python2.7/dist-packages/fasteners/process_lock.py", line 151, in acquire
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] self._do_open()
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/usr/local/lib/python2.7/dist-packages/fasteners/process_lock.py", line 123, in _do_open
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] self.lockfile = open(self.path, 'a')
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] IOError: [Errno 13] Permission denied: '/opt/stack/data/nova/instances/locks/nova-273d5645757056cdd056de4cfe9f121b9eee6ae3'
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4]

Gary Kotton (garyk)
Changed in nova:
importance: Undecided → Critical
Revision history for this message
Matthew Booth (mbooth-9) wrote :

So this is getting EPERM trying to open a lock file. The lock file is for the entry in the image cache, so potentially shared amongst many instances. The specific failure is because we can't open it for append. I'm going to assume there are no bugs in the locking libraries. Immediate causes of the failure that I can think of could be:

* We don't have write access to /opt/stack/data/nova/instances/locks/

How is this directory created?
Is there any potential race during creation, e.g. create as root then chown?
Is it potentially created somewhere which gives it root ownership if it runs first?
Are we on NFS with messed up uids between different nodes?

* /opt/stack/data/nova/instances/locks/nova-273d5645757056cdd056de4cfe9f121b9eee6ae3 exists, and is owned by another user

Is there a potential chown race?
Are we on NFS with messed up uids between different nodes?

* The filesystem is weird

Is append mode permitted?
Is NFS translating us to the nobody user or similar?

That's a bit of a brain dump. I'd need to know more about the setup, I think. Could you please include:

* nova.conf
* A description of the storage configuration
* The exact api call executed which resulted in failure

Matt Riedemann (mriedem)
Changed in nova:
status: New → Incomplete
importance: Critical → Undecided
Revision history for this message
Sujai (sujai) wrote :

Hi,

This issue was due to permission issue of the folder /opt/stack/data/nova/instances/locks.It was owned by 'root' user when I faced this issue.
I changed the owner of that folder and currently issue is not seen. I'm able to spin the instance on KVM without any issue.

Hence closing this bug.

Sujai (sujai)
Changed in nova:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.