Happens randowmly:
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [req-50d2dc6d-072e-423e-9de0-096ba307411f admin demo] [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] Instance failed to spawn
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] Traceback (most recent call last):
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/opt/stack/nova/nova/compute/manager.py", line 2078, in _build_resources
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] yield resources
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/opt/stack/nova/nova/compute/manager.py", line 1920, in _build_and_run_instance
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] block_device_info=block_device_info)
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2571, in spawn
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] admin_pass=admin_password)
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2960, in _create_image
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] image_id=disk_images['kernel_id'])
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/opt/stack/nova/nova/virt/libvirt/imagebackend.py", line 218, in cache
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] *args, **kwargs)
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/opt/stack/nova/nova/virt/libvirt/imagebackend.py", line 504, in create_image
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] prepare_template(target=base, *args, **kwargs)
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 264, in inner
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] do_log=False, semaphores=semaphores, delay=delay):
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] return self.gen.next()
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 216, in lock
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] ext_lock.acquire(delay=delay)
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/usr/local/lib/python2.7/dist-packages/fasteners/process_lock.py", line 151, in acquire
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] self._do_open()
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] File "/usr/local/lib/python2.7/dist-packages/fasteners/process_lock.py", line 123, in _do_open
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] self.lockfile = open(self.path, 'a')
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4] IOError: [Errno 13] Permission denied: '/opt/stack/data/nova/instances/locks/nova-273d5645757056cdd056de4cfe9f121b9eee6ae3'
2016-10-06 20:22:18.100 1931 ERROR nova.compute.manager [instance: 04709c94-0ba3-4312-ad1c-5739e5600bf4]
So this is getting EPERM trying to open a lock file. The lock file is for the entry in the image cache, so potentially shared amongst many instances. The specific failure is because we can't open it for append. I'm going to assume there are no bugs in the locking libraries. Immediate causes of the failure that I can think of could be:
* We don't have write access to /opt/stack/ data/nova/ instances/ locks/
How is this directory created?
Is there any potential race during creation, e.g. create as root then chown?
Is it potentially created somewhere which gives it root ownership if it runs first?
Are we on NFS with messed up uids between different nodes?
* /opt/stack/ data/nova/ instances/ locks/nova- 273d5645757056c dd056de4cfe9f12 1b9eee6ae3 exists, and is owned by another user
Is there a potential chown race?
Are we on NFS with messed up uids between different nodes?
* The filesystem is weird
Is append mode permitted?
Is NFS translating us to the nobody user or similar?
That's a bit of a brain dump. I'd need to know more about the setup, I think. Could you please include:
* nova.conf
* A description of the storage configuration
* The exact api call executed which resulted in failure