ImageCacheManager raises Permission denied error on nova compute in race condition

Bug #1470437 reported by Ankit Agrawal
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Ankit Agrawal
Liberty
Fix Released
Undecided
Unassigned

Bug Description

ImageCacheManager raises Permission denied error on nova compute in race condition

While creating an instance snapshot nova calls guest.launch method from libvirt driver which changes the base file permissions and updates base file user from openstack to libvirt-qemu (in case of qcow2 image backend). In race condition when ImageCacheManager is trying to update last access time of this base file and guest.launch is called by instance snapshot just before updating the access time, ImageCacheManager raise Permission denied error in nova compute for os.utime().

Steps to reproduce:
1. Configure image_cache_manager_interval=120 in nova.conf and use qcow2 image backend.
2. Add a sleep for 60 sec in _handle_base_image method of libvirt.imagecache just before calling os.utime().
3. Restart nova services.
4. Create an instance using image.
$ nova boot --image 5e1659aa-6d38-44e8-aaa3-4217337436c0 --flavor 1 instance-1
5. Check that instance is in active state.
6. Go to the n-cpu screen and check imagecache manager logs at the point it waits to execute sleep statement added in step #2.
7. Send instance snapshot request when imagecache manger is waiting to execute sleep.
$ nova image-create 19c7900b-73d5-4c2e-b129-5e2a6b13f396 instance-1-snap
8. instance snapshot request updates the base file owner to libvirt-qemu by calling guest.launch method from libvirt driver.
9. Now when imagecache manger comes out from sleep and executes os.utime it raise following Permission denied error in nova compute.

2015-07-01 01:51:46.794 ERROR nova.openstack.common.periodic_task [req-a03fa45f-ffb9-48dd-8937-5b0414c6864b None None] Error during ComputeManager._run_image_cache_manager_pass
2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task Traceback(most recent call last):
2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task File "/opt/stack/nova/nova/openstack/common/periodic_task.py", line 224, in run_periodic_tasks
2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task task(self, context)
2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task File "/opt/stack/nova/nova/compute/manager.py", line 6177, in _run_image_cache_manager_pass
2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task self.driver.manage_image_cache(context, filtered_instances)
2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6252, in manage_image_cache
2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task self.image_cache_manager.update(context, all_instances)
2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task File "/opt/stack/nova/nova/virt/libvirt/imagecache.py", line 668, in update
2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task
self._age_and_verify_cached_images(context, all_instances, base_dir)
2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task File "/opt/stack/nova/nova/virt/libvirt/imagecache.py", line 598, in _age_and_verify_cached_images
2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task self._handle_base_image(img, base_file)
2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task File "/opt/stack/nova/nova/virt/libvirt/imagecache.py", line 570, in _handle_base_image
2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task os.utime(base_file, None)
2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task OSError:[Errno 13] Permission denied: '/opt/stack/data/nova/instances/_base/8d2c340dcce68e48a75457b1e91457feed27aef5'
2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task

Expected result: guest.launch should not update the base file permissions and owner to libvirt-qemu. Base file owner should remain unchanged.

Actual result: Libvirt is updating the base file owner which causes permission issues in nova.

Changed in nova:
assignee: nobody → Ankit Agrawal (ankitagrawal)
Revision history for this message
Ankit Agrawal (ankitagrawal) wrote :

I see this error intermittently in compute logs. When I was analyzing the source code to fix this issue I found that there is a provision in /etc/libvirt/qemu.conf to configure user and group via the user=$USERNAME and group=$GROUPNAME parameters.
If I change libvirt user to same as nova user I do not see this Permission issue any more in imagecache manager for updating the base/backing file access time.

Is this a valid way to fix this issue by making changes in libvirt configuration? Please suggest.

Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote : Cleanup

@Ankit Agrawal:

Since you are set as assignee, I switch the status to 'In Progress'.

tags: added: libvirt snapshot
Changed in nova:
status: New → In Progress
Revision history for this message
Ankit Agrawal (ankitagrawal) wrote :

I am little bit concern about the security issue with the solution mentioned in note #1 by configuring the user and group to a non root user in libvirt..

Alternatively we can either check write access on the base file just before calling os.utime or can put this in try except block, where in case of exception we'll catch this exception and will log a message base file permissions has been updated by another api or guest.launch.

With this approach when exception is caught last modified time of base file will not be updated but last access time of the file has been updated by guest.launch method which has changed the file ownership. So we can calculate base file age based on last access time instead of last modified time before deleting it from cache.

This will solve our problem completely and IMO will also reduce the chances of deleting unwanted base files from imagecache manager by calculating age based on last access time instead of last modified time.

Please provide your opinion about these approaches. Thanks !

Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/185549
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ec9d5e375e208686d33b9259b039cc009bded42e
Submitter: Jenkins
Branch: master

commit ec9d5e375e208686d33b9259b039cc009bded42e
Author: Ankit Agrawal <email address hidden>
Date: Mon Aug 10 16:27:57 2015 +1000

    libvirt: Race condition leads to instance in error

    ImageCacheManager deletes base image while image backend is copying
    image to the instance path leading instance to go in the error state.

    Acquired lock before removing image from cache. If libvirt is copying
    image to the instance path, image cache manager won't be able to remove
    it until libvirt finishes copying image completely.

    Closes-Bug: 1256838
    Closes-Bug: 1470437
    Co-Authored-By: Michael Still <email address hidden>
    Depends-On: I337ce28e2fc516c91bec61ca3639ebff0029ad49
    Change-Id: I376cc951922c338669fdf3f83da83e0d3cea1532

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/278928

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/nova 13.0.0.0b3

This issue was fixed in the openstack/nova 13.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/liberty)

Change abandoned by Ankit Agrawal (<email address hidden>) on branch: stable/liberty
Review: https://review.openstack.org/278928

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.