File permission error when running volume snapshot tests

Bug #1473116 reported by Silvan Kaiser
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Invalid
Undecided
Unassigned
tempest
Invalid
Undecided
Unassigned

Bug Description

-------
Update:
The core issue is described in comment #4, please go there for a quick read.

-------

First of all: I am currently not sure in which of the following components this bug is sitting:
1) Our Quobyte Cinder driver
2) The Cinder RemoteFS driver
3) Tempests VolumesV2SnapshotTestJSON / VolumesV1SnapshotTestJSON implementation
I'm starting with the Cinder project because i think it's most likely based in the Cinder or Quobyte driver code.

The error:
Our Quobyte Cinder driver (derived from RemoteFS driver) throws the following error when running tempests volume tests as a group (e.g. 'tox -e all -- volume'):

2015-07-09 06:28:18.263 DEBUG oslo_concurrency.processutils [req-6c70e5ad-f7b8-4cc3-a1e1-43671119bb94 tempest-VolumesV1SnapshotTestJSON-1379670915] u'qemu-img create -f qcow2 -o backing_file=/opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.2a2298c2-4a23-4f78-bcc3-bf8ec51b1485 /opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.1c3aa8f9-9c09-49ec-b524-eea876b87c51' failed. Not Retrying. execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:295
2015-07-09 06:28:18.264 DEBUG oslo_concurrency.lockutils [req-6c70e5ad-f7b8-4cc3-a1e1-43671119bb94 tempest-VolumesV1SnapshotTestJSON-1379670915] Lock "quobyte" released by "create_snapshot" :: held 0.035s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:262
2015-07-09 06:28:18.304 ERROR oslo_messaging.rpc.dispatcher [req-6c70e5ad-f7b8-4cc3-a1e1-43671119bb94 tempest-VolumesV1SnapshotTestJSON-1379670915] Exception during message handling: Unexpected error while running command.
Command: None
Exit code: -
Stdout: u'Unexpected error while running command.\nCommand: qemu-img create -f qcow2 -o backing_file=/opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.2a2298c2-4a23-4f78-bcc3-bf8ec51b1485 /opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.1c3aa8f9-9c09-49ec-b524-eea876b87c51\nExit code: 1\nStdout: u\'\'\nStderr: u"qemu-img: /opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.1c3aa8f9-9c09-49ec-b524-eea876b87c51: Could not open \'/opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.2a2298c2-4a23-4f78-bcc3-bf8ec51b1485\': Could not open \'/opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.2a2298c2-4a23-4f78-bcc3-bf8ec51b1485\': Permission denied: Permission denied\\n"'
Stderr: None
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher executor_callback))
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher executor_callback)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/osprofiler/profiler.py", line 105, in wrapper
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher return f(*args, **kwargs)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/volume/manager.py", line 662, in create_snapshot
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher snapshot.save()
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 119, in __exit__
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/volume/manager.py", line 654, in create_snapshot
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher model_update = self.driver.create_snapshot(snapshot)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/osprofiler/profiler.py", line 105, in wrapper
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher return f(*args, **kwargs)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 252, in inner
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher return f(*args, **kwargs)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/volume/drivers/quobyte.py", line 214, in create_snapshot
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher return self._create_snapshot(snapshot)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/volume/drivers/remotefs.py", line 1259, in _create_snapshot
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher new_snap_path)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/volume/drivers/remotefs.py", line 1119, in _do_create_snapshot
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher self._execute(*command, run_as_root=self._execute_as_root)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/utils.py", line 152, in execute
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher return processutils.execute(*cmd, **kwargs)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 262, in execute
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher cmd=sanitized_cmd)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher ProcessExecutionError: Unexpected error while running command.
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher Command: qemu-img create -f qcow2 -o backing_file=/opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.2a2298c2-4a23-4f78-bcc3-bf8ec51b1485 /opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.1c3aa8f9-9c09-49ec-b524-eea876b87c51
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher Exit code: 1
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher Stdout: u''
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher Stderr: u"qemu-img: /opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.1c3aa8f9-9c09-49ec-b524-eea876b87c51: Could not open '/opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.2a2298c2-4a23-4f78-bcc3-bf8ec51b1485': Could not open '/opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.2a2298c2-4a23-4f78-bcc3-bf8ec51b1485': Permission denied: Permission denied\n"

Examples for this can be found at the Quobyte CI logs, e.g. in http://176.9.127.22:8081/refs-changes-53-104653-3/ (shows complete logs of all components for a whole Tempest volume tests run).

Several tests are failing with this error, the failing tests are from VolumesV[1,2]SnapshotTestJSON tests:
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshot_create_get_list_update_delete [4.649300s] ... ok
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshot_create_with_volume_in_use [25.604304s] ... ok
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshots_list_details_with_params [2.261079s] ... FAILED
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshots_list_with_params [2.271693s] ... FAILED
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_volume_from_snapshot [2.250898s] ... FAILED
{0} tempest.api.volume.test_volumes_snapshots.VolumesV2SnapshotTestJSON.test_snapshot_create_get_list_update_delete [4.530919s] ... ok
{1} tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario [56.559328s] ... ok
{0} tempest.api.volume.test_volumes_snapshots.VolumesV2SnapshotTestJSON.test_snapshot_create_with_volume_in_use [24.537383s] ... ok
{0} tempest.api.volume.test_volumes_snapshots.VolumesV2SnapshotTestJSON.test_snapshots_list_details_with_params [2.273560s] ... FAILED
{0} tempest.api.volume.test_volumes_snapshots.VolumesV2SnapshotTestJSON.test_snapshots_list_with_params [2.337515s] ... FAILED
{0} tempest.api.volume.test_volumes_snapshots.VolumesV2SnapshotTestJSON.test_volume_from_snapshot [2.273338s] ... FAILED

Interestingly these errors do _not_ turn up when running the failing tests one by one. So, e.g. if i run tempest.api.volume.test_volumes_snapshots.VolumesV2SnapshotTestJSON.test_snapshots_list_details_with_params standalone it works ok. Only when run as a group these tests fail.

Example for a single run:
ubuntu@manualvm:/opt/stack/tempest$ tox -e all -- tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_volume_from_snapshot
all develop-inst-nodeps: /opt/stack/tempest
all installed: anyjson==0.3.3,Babel==1.3,boto==2.38.0,cffi==1.1.2,cliff==1.13.0,cmd2==0.6.8,cryptography==0.9.2,debtcollector==0.5.0,ecdsa==0.13,enum34==1.0.4,extras==0.0.3,fasteners==0.12.0,fixtures==1.3.1,functools32==3.2.3.post1,httplib2==0.9.1,idna==2.0,ipaddress==1.0.7,iso8601==0.1.10,jsonschema==2.5.1,linecache2==1.0.0,monotonic==0.2,msgpack-python==0.4.6,netaddr==0.7.15,netifaces==0.10.4,os-testr==0.2.0,oslo.concurrency==2.2.0,oslo.config==1.14.0,oslo.context==0.4.0,oslo.i18n==2.1.0,oslo.log==1.6.0,oslo.serialization==1.7.0,oslo.utils==1.8.0,paramiko==1.15.2,pbr==1.2.0,posix-ipc==1.0.0,prettytable==0.7.2,pyasn1==0.1.8,pycparser==2.14,pycrypto==2.6.1,pyOpenSSL==0.15.1,pyparsing==2.0.3,python-mimeparse==0.1.4,python-subunit==1.1.0,pytz==2015.4,PyYAML==3.11,retrying==1.3.3,six==1.9.0,stevedore==1.6.0,-e git+https://git.openstack.org/openstack/tempest.git@91ebdfac2f6021aa1bccceaf4274bb801cc0891b#egg=tempest-master,tempest-lib==0.6.1,testrepository==0.0.20,testscenarios==0.5.0,testtools==1.8.0,traceback2==1.4.0,unittest2==1.1.0,wheel==0.24.0,wrapt==1.10.5
all runtests: PYTHONHASHSEED='264817387'
all runtests: commands[0] | find . -type f -name *.pyc -delete
all runtests: commands[1] | bash tools/pretty_tox.sh tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_volume_from_snapshot
running testr
running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \
OS_TEST_LOCK_PATH=${OS_TEST_LOCK_PATH:-${TMPDIR:-'/tmp'}} \
${PYTHON:-python} -m subunit.run discover -t ${OS_TOP_LEVEL:-./} ${OS_TEST_PATH:-./tempest/test_discover} --list
running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \
OS_TEST_LOCK_PATH=${OS_TEST_LOCK_PATH:-${TMPDIR:-'/tmp'}} \
${PYTHON:-python} -m subunit.run discover -t ${OS_TOP_LEVEL:-./} ${OS_TEST_PATH:-./tempest/test_discover} --load-list /tmp/tmpjGQTWu
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_volume_from_snapshot [12.260094s] ... ok

======
Totals
======
Ran: 1 tests in 21.0000 sec.
 - Passed: 1
 - Skipped: 0
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 0
Sum of execute time for each test: 12.2601 sec.

==============
Worker Balance
==============
 - Worker 0 (1 tests) => 0:00:12.260094s

Slowest Tests:

Test id Runtime (s)
-------------------------------------------------------------------------------------------------------------------------------------- -----------
tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_volume_from_snapshot[id-677863d1-3142-456d-b6ac-9924f667a7f4] 12.260
_____________________________________________________________________________________________________ summary _____________________________________________________________________________________________________
  all: commands succeeded
  congratulations :)

Now, running the whole test case provides:
all develop-inst-nodeps: /opt/stack/tempest
all installed: anyjson==0.3.3,Babel==1.3,boto==2.38.0,cffi==1.1.2,cliff==1.13.0,cmd2==0.6.8,cryptography==0.9.2,debtcollector==0.5.0,ecdsa==0.13,enum34==1.0.4,extras==0.0.3,fasteners==0.12.0,fixtures==1.3.1,functools32==3.2.3.post1,httplib2==0.9.1,idna==2.0,ipaddress==1.0.7,iso8601==0.1.10,jsonschema==2.5.1,linecache2==1.0.0,monotonic==0.2,msgpack-python==0.4.6,netaddr==0.7.15,netifaces==0.10.4,os-testr==0.2.0,oslo.concurrency==2.2.0,oslo.config==1.14.0,oslo.context==0.4.0,oslo.i18n==2.1.0,oslo.log==1.6.0,oslo.serialization==1.7.0,oslo.utils==1.8.0,paramiko==1.15.2,pbr==1.2.0,posix-ipc==1.0.0,prettytable==0.7.2,pyasn1==0.1.8,pycparser==2.14,pycrypto==2.6.1,pyOpenSSL==0.15.1,pyparsing==2.0.3,python-mimeparse==0.1.4,python-subunit==1.1.0,pytz==2015.4,PyYAML==3.11,retrying==1.3.3,six==1.9.0,stevedore==1.6.0,-e git+https://git.openstack.org/openstack/tempest.git@91ebdfac2f6021aa1bccceaf4274bb801cc0891b#egg=tempest-master,tempest-lib==0.6.1,testrepository==0.0.20,testscenarios==0.5.0,testtools==1.8.0,traceback2==1.4.0,unittest2==1.1.0,wheel==0.24.0,wrapt==1.10.5
all runtests: PYTHONHASHSEED='3462064401'
all runtests: commands[0] | find . -type f -name *.pyc -delete
all runtests: commands[1] | bash tools/pretty_tox.sh tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON
running testr
running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \
OS_TEST_LOCK_PATH=${OS_TEST_LOCK_PATH:-${TMPDIR:-'/tmp'}} \
${PYTHON:-python} -m subunit.run discover -t ${OS_TOP_LEVEL:-./} ${OS_TEST_PATH:-./tempest/test_discover} --list
running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \
OS_TEST_LOCK_PATH=${OS_TEST_LOCK_PATH:-${TMPDIR:-'/tmp'}} \
${PYTHON:-python} -m subunit.run discover -t ${OS_TOP_LEVEL:-./} ${OS_TEST_PATH:-./tempest/test_discover} --load-list /tmp/tmp4Nkte1
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshot_create_get_list_update_delete [4.813857s] ... ok
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshot_create_with_volume_in_use [24.101065s] ... ok
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshots_list_details_with_params [2.251190s] ... FAILED
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshots_list_with_params [2.229117s] ... FAILED
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_volume_from_snapshot [2.232561s] ... FAILED

==============================
Failed 3 tests - output below:
==============================

[...]

The Quobyte driver runs with nas_secure_file_operations and nas_secure_file_permissions set to true, meaning it never uses root:root as UID:GID and always uses file permissions 660 . Some manual testing however showed that in the failing test runs image files with root:root 660 occurred, e.g.:
-rw-rw---- 1 root root 393216 Jul 9 10:07 volume-afc37512-6c8a-4962-a512-188a59d7720b.12edbd44-cbc3-47c3-82af-1eaa5a0e5339
-rw-r--r-- 1 ubuntu ubuntu 97 Jul 9 10:06 volume-afc37512-6c8a-4962-a512-188a59d7720b.info

Now the big question is: Why is this image file set to root:root ? I do not see how the driver might have done it and further i do not see why this occurs only if the tests are run as a group but not when the tests are run singleton. As far as i can see the test case does clean up after each test, so that should not be an issue, either.

Quobyte driver can be found e.g. here: https://github.com/openstack/cinder/blob/stable/kilo/cinder/volume/drivers/quobyte.py

An earlier bug report that did not get as deep can be found here: https://bugs.launchpad.net/cinder/+bug/1438603

Silvan Kaiser (2-silvan)
description: updated
description: updated
Silvan Kaiser (2-silvan)
description: updated
Revision history for this message
Silvan Kaiser (2-silvan) wrote :

Small update: We're still debugging this, currently logging and analyzing operations taking place during the tests at file system level, so far we've not find a decisive clue what's causing this.

Changed in cinder:
assignee: nobody → Silvan Kaiser (2-silvan)
Revision history for this message
Silvan Kaiser (2-silvan) wrote :

So, further analysis shows that apparmor is interfering with the qemu operations.
Deinstalling apparmor changes the behavior of the tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON tests. Where three of those five tests fail with permission denied (see above) with apparmor, only one test fails without apparmor. The failing test is tempest.api.volume.test_volumes_snapshots.VolumesV2SnapshotTestJSON.test_snapshot_create_with_volume_in_use, a test that worked ok in the run _with_ apparmor. This single test fails because libvirt is unable to load it's apparmor profile.

Looking deeper into this...

Revision history for this message
Silvan Kaiser (2-silvan) wrote :

libvirtd.log shows:
2015-07-15 17:48:49.964+0000: 30240: error : virCommandWait:2399 : internal error: Child process (/usr/lib/libvirt/virt-aa-helper -p 0 -c -u libvirt-29480f07-1a9c-4d15-ba11-11e564c024c9) unexpected exit status 1: virt-aa-helper: error: apparmor_parser exited with error

2015-07-15 17:48:49.964+0000: 30240: debug : virCommandRun:2142 : Result status 0, stdout: '' stderr: 'virt-aa-helper: error: apparmor_parser exited with error
'
2015-07-15 17:48:49.964+0000: 30240: error : AppArmorGenSecurityLabel:451 : internal error: cannot load AppArmor profile 'libvirt-29480f07-1a9c-4d15-ba11-11e564c024c9'

Revision history for this message
Silvan Kaiser (2-silvan) wrote :

Further down the road:

Seems the core issue is libvirts behaviour with changing an images ownership when instantiating a guest from an image. So with the default config (dynamic_ownership = 1) libvirt does change the ownership twice:
When starting the guest the images ownership is set to the libvirts process ownership (as controlled by user & group settings in qemu.conf). When the guest is destroyed later on (e.g. test is teared down) the images ownership is changed to root:root by libvirt.
This seems to be a long standing topic in libvirt discussed e.g. in https://bugzilla.redhat.com/show_bug.cgi?id=547546 and https://www.redhat.com/archives/libvirt-users/2010-January/msg00023.html

This might be interesting for Tempest, as the tests work single by themselves. The issue is the tests using the same image file. As one of the tests instantiates a VM libvirt changes the images permissions and the following tests fail because they cannot access the image any more.

Example test case is shown above:
1) tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshot_create_get_list_update_delete ... ok
2) tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshot_create_with_volume_in_use ... ok
3) tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshots_list_details_with_params ... FAILED
4) tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshots_list_with_params ... FAILED
5) tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_volume_from_snapshot ... FAILED

Test No. 2) instantiates the VM, subsequently all other tests fail.

Now the core issue seems to be libvirts behaviour to me. Until that changes i currently think this could be handled in tempest by removing this side effect from the affected test cases, namely VolumesV2SnapshotTestJSON and VolumesV1SnapshotTestJSON .
Alternatively our driver could interfere and change ownership but that's not a nice solution...

Revision history for this message
Silvan Kaiser (2-silvan) wrote :

Please somebody from tempest check out if fixing this by improved test isolation in tempest is a viable solution from your point of view.

affects: cinder → tempest
Changed in tempest:
assignee: Silvan Kaiser (2-silvan) → nobody
Revision history for this message
Silvan Kaiser (2-silvan) wrote :
Download full text (38.5 KiB)

I'll add a log example from the qemu-img wrapper i wrote to see what exactly is happening with qemu-img operations. This shows each qemu-img call during the VolumesV2SnapshotTestJSON tests with some metadata, showing when the ownership of the backing file changes and the subsequent failing qemu-img operations.

-----------------<2015-07-14 19:46:47>----------------
With user 1000 and group 1000:
/usr/bin/qemu-img create -f qcow2 -o preallocation=metadata /opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-e91c886a-8edc-416e-866c-63b2280add2b 1073741824
---------Active Quobyte clients:
ubuntu 25917 0.0 0.3 2187720 26420 ? Ssl 12:04 0:06 mount.quobyte 176.9.127.22:7861/quobyteci_test_volume /opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07
ubuntu 3986 6.2 0.2 2179400 20584 ? Ssl 19:45 0:06 mount.quobyte 176.9.127.22:7861/quobyteci_test_volume /opt/stack/data/nova/mnt/38effaab97f9ee27ecb78121f511fa07
---------File path for 1073741824 prior to operation:
------Operation:
----OutStream:
Formatting '/opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-e91c886a-8edc-416e-866c-63b2280add2b', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 preallocation='metadata' lazy_refcounts=off
----ErrStream:
----------File path after operation:

----------------------Exit code 0------------------
-----------------<2015-07-14 19:46:49>----------------
With user 1000 and group 1000:
/usr/bin/qemu-img create -f qcow2 -o backing_file=/opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-e91c886a-8edc-416e-866c-63b2280add2b /opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-e91c886a-8edc-416e-866c-63b2280add2b.25d49040-2878-4150-9e59-f0ffa867da8a
---------Active Quobyte clients:
ubuntu 25917 0.0 0.3 2187720 26612 ? Ssl 12:04 0:06 mount.quobyte 176.9.127.22:7861/quobyteci_test_volume /opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07
ubuntu 3986 6.1 0.2 2179400 20584 ? Ssl 19:45 0:06 mount.quobyte 176.9.127.22:7861/quobyteci_test_volume /opt/stack/data/nova/mnt/38effaab97f9ee27ecb78121f511fa07
---------File path for /opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-e91c886a-8edc-416e-866c-63b2280add2b prior to operation:
-rw-rw-rw- 1 ubuntu ubuntu 1074135040 Jul 14 19:45 /opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-e91c886a-8edc-416e-866c-63b2280add2b
------Operation:
----OutStream:
Formatting '/opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-e91c886a-8edc-416e-866c-63b2280add2b.25d49040-2878-4150-9e59-f0ffa867da8a', fmt=qcow2 size=1073741824 backing_file='/opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-e91c886a-8edc-416e-866c-63b2280add2b' encryption=off cluster_size=65536 lazy_refcounts=off
----ErrStream:
----------File path after operation:
-rw-rw-rw- 1 ubuntu ubuntu 1074135040 Jul 14 19:45 /opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-e91c886a-8edc-416e-866c-63b2280add2b

----------------------Exit code 0------------------
-----------------<2015-07-14 19:46:50>----------------
With user 0 and group 0:
/usr/bin/qemu-img info ...

Silvan Kaiser (2-silvan)
description: updated
Revision history for this message
Jordan Pittier (jordan-pittier) wrote :

Any idea on why is Quobyte affected and not GlusterFS ?

Could you try with quobyte_qcow2_volumes set to False please ?

Revision history for this message
Silvan Kaiser (2-silvan) wrote :

My guess is a different libvirt configuration, i'm currently successfully testing a libvirt reconfig with dynamic_ownership = 0 and common UID:GID settings for libvirtd & testing user.

I can test with raw in a few minutes

Revision history for this message
Silvan Kaiser (2-silvan) wrote :

Same results with quobyte_qcow2_volumes = False, permission denied errors. The libvirt reconfig works, however.

Revision history for this message
Silvan Kaiser (2-silvan) wrote :

So, we've been able to workaround this issue in our CI by adopting the libvirt configuration.

Besides that imho the tempest team should check out if stronger test isolation can/should be done for the aforementioned test cases.

Revision history for this message
Eric Harney (eharney) wrote :

I'm not clear on what is causing the file permissions to change from libvirt:qemu-kvm 644 to 660.

The expectation is that libvirt dynamic_ownership should work with this, it is always used with the Gluster driver. Is this a problem with nas_secure_file_operations?

Revision history for this message
David Kranz (david-kranz) wrote :

Can you please explain how you think tempest could be changed to prevent this? Tempest is just making api calls.

Changed in tempest:
status: New → Incomplete
Revision history for this message
Silvan Kaiser (2-silvan) wrote :

Thanks for looking into this!

@Eric: Yep, this is with nas_secure_file_operations on.

@David: The thought to look into the tests came to me due to the fact that the tests are successful when run singleton but fail when run in a group. The image manipulated by one test seems to be used in other tests afterwards. I'd think of tests to be completely isolated by e.g. not re-using the same volume.
To be clear: I do not mean that this issue is a tempest bug but i thought in the context of this issue tempest team might be interested to see if there's stronger test isolation required.
If not please move this ticket back to cinder.

Revision history for this message
Silvan Kaiser (2-silvan) wrote :

Instead of switching project i added the second.

@David: Please close the tempest part if deemed applicable

@Eric: If you've further ideas / experiences i'm happy to hear about them. The permission change is the result of the nas_secure_file_operations flag, as you guessed.

Changed in tempest:
status: Incomplete → New
Revision history for this message
Silvan Kaiser (2-silvan) wrote :

@Eric, @Jordan, i stumbled over this today:

https://bugzilla.redhat.com/show_bug.cgi?id=1057645#c7

Seems like GlusterFS had similiar issues some time ago but libvirt implemented a GlusterFS autodetection. This would explain why Gluster testing is behaving differently from our Quobyte tests as libvirt currently does not detect Quobyte as a shared file system by itself.
The workaround described in there is exactly what we're doing right now in our CI setup.

Silvan Kaiser (2-silvan)
Changed in tempest:
status: New → Opinion
Silvan Kaiser (2-silvan)
Changed in tempest:
status: Opinion → Invalid
Revision history for this message
Silvan Kaiser (2-silvan) wrote :

With the given libvirt/qemu.conf settings snapshots are working now. Furthermore quite a few libvirt/ownership/snapshot issues with Nova & Quobyte have been handled in the last months. Thus i set this to invalid as it seems fixed for now.

Changed in cinder:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.