-------
Update:
The core issue is described in comment #4, please go there for a quick read.
-------
First of all: I am currently not sure in which of the following components this bug is sitting:
1) Our Quobyte Cinder driver
2) The Cinder RemoteFS driver
3) Tempests VolumesV2SnapshotTestJSON / VolumesV1SnapshotTestJSON implementation
I'm starting with the Cinder project because i think it's most likely based in the Cinder or Quobyte driver code.
The error:
Our Quobyte Cinder driver (derived from RemoteFS driver) throws the following error when running tempests volume tests as a group (e.g. 'tox -e all -- volume'):
2015-07-09 06:28:18.263 DEBUG oslo_concurrency.processutils [req-6c70e5ad-f7b8-4cc3-a1e1-43671119bb94 tempest-VolumesV1SnapshotTestJSON-1379670915] u'qemu-img create -f qcow2 -o backing_file=/opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.2a2298c2-4a23-4f78-bcc3-bf8ec51b1485 /opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.1c3aa8f9-9c09-49ec-b524-eea876b87c51' failed. Not Retrying. execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:295
2015-07-09 06:28:18.264 DEBUG oslo_concurrency.lockutils [req-6c70e5ad-f7b8-4cc3-a1e1-43671119bb94 tempest-VolumesV1SnapshotTestJSON-1379670915] Lock "quobyte" released by "create_snapshot" :: held 0.035s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:262
2015-07-09 06:28:18.304 ERROR oslo_messaging.rpc.dispatcher [req-6c70e5ad-f7b8-4cc3-a1e1-43671119bb94 tempest-VolumesV1SnapshotTestJSON-1379670915] Exception during message handling: Unexpected error while running command.
Command: None
Exit code: -
Stdout: u'Unexpected error while running command.\nCommand: qemu-img create -f qcow2 -o backing_file=/opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.2a2298c2-4a23-4f78-bcc3-bf8ec51b1485 /opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.1c3aa8f9-9c09-49ec-b524-eea876b87c51\nExit code: 1\nStdout: u\'\'\nStderr: u"qemu-img: /opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.1c3aa8f9-9c09-49ec-b524-eea876b87c51: Could not open \'/opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.2a2298c2-4a23-4f78-bcc3-bf8ec51b1485\': Could not open \'/opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.2a2298c2-4a23-4f78-bcc3-bf8ec51b1485\': Permission denied: Permission denied\\n"'
Stderr: None
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher executor_callback))
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher executor_callback)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/osprofiler/profiler.py", line 105, in wrapper
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher return f(*args, **kwargs)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/volume/manager.py", line 662, in create_snapshot
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher snapshot.save()
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 119, in __exit__
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/volume/manager.py", line 654, in create_snapshot
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher model_update = self.driver.create_snapshot(snapshot)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/osprofiler/profiler.py", line 105, in wrapper
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher return f(*args, **kwargs)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 252, in inner
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher return f(*args, **kwargs)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/volume/drivers/quobyte.py", line 214, in create_snapshot
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher return self._create_snapshot(snapshot)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/volume/drivers/remotefs.py", line 1259, in _create_snapshot
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher new_snap_path)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/volume/drivers/remotefs.py", line 1119, in _do_create_snapshot
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher self._execute(*command, run_as_root=self._execute_as_root)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/utils.py", line 152, in execute
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher return processutils.execute(*cmd, **kwargs)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 262, in execute
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher cmd=sanitized_cmd)
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher ProcessExecutionError: Unexpected error while running command.
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher Command: qemu-img create -f qcow2 -o backing_file=/opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.2a2298c2-4a23-4f78-bcc3-bf8ec51b1485 /opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.1c3aa8f9-9c09-49ec-b524-eea876b87c51
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher Exit code: 1
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher Stdout: u''
2015-07-09 06:28:18.304 11468 ERROR oslo_messaging.rpc.dispatcher Stderr: u"qemu-img: /opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.1c3aa8f9-9c09-49ec-b524-eea876b87c51: Could not open '/opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.2a2298c2-4a23-4f78-bcc3-bf8ec51b1485': Could not open '/opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-36a9b848-c388-4cff-9a07-bb496d9b6574.2a2298c2-4a23-4f78-bcc3-bf8ec51b1485': Permission denied: Permission denied\n"
Examples for this can be found at the Quobyte CI logs, e.g. in http://176.9.127.22:8081/refs-changes-53-104653-3/ (shows complete logs of all components for a whole Tempest volume tests run).
Several tests are failing with this error, the failing tests are from VolumesV[1,2]SnapshotTestJSON tests:
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshot_create_get_list_update_delete [4.649300s] ... ok
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshot_create_with_volume_in_use [25.604304s] ... ok
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshots_list_details_with_params [2.261079s] ... FAILED
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshots_list_with_params [2.271693s] ... FAILED
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_volume_from_snapshot [2.250898s] ... FAILED
{0} tempest.api.volume.test_volumes_snapshots.VolumesV2SnapshotTestJSON.test_snapshot_create_get_list_update_delete [4.530919s] ... ok
{1} tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario [56.559328s] ... ok
{0} tempest.api.volume.test_volumes_snapshots.VolumesV2SnapshotTestJSON.test_snapshot_create_with_volume_in_use [24.537383s] ... ok
{0} tempest.api.volume.test_volumes_snapshots.VolumesV2SnapshotTestJSON.test_snapshots_list_details_with_params [2.273560s] ... FAILED
{0} tempest.api.volume.test_volumes_snapshots.VolumesV2SnapshotTestJSON.test_snapshots_list_with_params [2.337515s] ... FAILED
{0} tempest.api.volume.test_volumes_snapshots.VolumesV2SnapshotTestJSON.test_volume_from_snapshot [2.273338s] ... FAILED
Interestingly these errors do _not_ turn up when running the failing tests one by one. So, e.g. if i run tempest.api.volume.test_volumes_snapshots.VolumesV2SnapshotTestJSON.test_snapshots_list_details_with_params standalone it works ok. Only when run as a group these tests fail.
Example for a single run:
ubuntu@manualvm:/opt/stack/tempest$ tox -e all -- tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_volume_from_snapshot
all develop-inst-nodeps: /opt/stack/tempest
all installed: anyjson==0.3.3,Babel==1.3,boto==2.38.0,cffi==1.1.2,cliff==1.13.0,cmd2==0.6.8,cryptography==0.9.2,debtcollector==0.5.0,ecdsa==0.13,enum34==1.0.4,extras==0.0.3,fasteners==0.12.0,fixtures==1.3.1,functools32==3.2.3.post1,httplib2==0.9.1,idna==2.0,ipaddress==1.0.7,iso8601==0.1.10,jsonschema==2.5.1,linecache2==1.0.0,monotonic==0.2,msgpack-python==0.4.6,netaddr==0.7.15,netifaces==0.10.4,os-testr==0.2.0,oslo.concurrency==2.2.0,oslo.config==1.14.0,oslo.context==0.4.0,oslo.i18n==2.1.0,oslo.log==1.6.0,oslo.serialization==1.7.0,oslo.utils==1.8.0,paramiko==1.15.2,pbr==1.2.0,posix-ipc==1.0.0,prettytable==0.7.2,pyasn1==0.1.8,pycparser==2.14,pycrypto==2.6.1,pyOpenSSL==0.15.1,pyparsing==2.0.3,python-mimeparse==0.1.4,python-subunit==1.1.0,pytz==2015.4,PyYAML==3.11,retrying==1.3.3,six==1.9.0,stevedore==1.6.0,-e git+https://git.openstack.org/openstack/tempest.git@91ebdfac2f6021aa1bccceaf4274bb801cc0891b#egg=tempest-master,tempest-lib==0.6.1,testrepository==0.0.20,testscenarios==0.5.0,testtools==1.8.0,traceback2==1.4.0,unittest2==1.1.0,wheel==0.24.0,wrapt==1.10.5
all runtests: PYTHONHASHSEED='264817387'
all runtests: commands[0] | find . -type f -name *.pyc -delete
all runtests: commands[1] | bash tools/pretty_tox.sh tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_volume_from_snapshot
running testr
running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \
OS_TEST_LOCK_PATH=${OS_TEST_LOCK_PATH:-${TMPDIR:-'/tmp'}} \
${PYTHON:-python} -m subunit.run discover -t ${OS_TOP_LEVEL:-./} ${OS_TEST_PATH:-./tempest/test_discover} --list
running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \
OS_TEST_LOCK_PATH=${OS_TEST_LOCK_PATH:-${TMPDIR:-'/tmp'}} \
${PYTHON:-python} -m subunit.run discover -t ${OS_TOP_LEVEL:-./} ${OS_TEST_PATH:-./tempest/test_discover} --load-list /tmp/tmpjGQTWu
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_volume_from_snapshot [12.260094s] ... ok
======
Totals
======
Ran: 1 tests in 21.0000 sec.
- Passed: 1
- Skipped: 0
- Expected Fail: 0
- Unexpected Success: 0
- Failed: 0
Sum of execute time for each test: 12.2601 sec.
==============
Worker Balance
==============
- Worker 0 (1 tests) => 0:00:12.260094s
Slowest Tests:
Test id Runtime (s)
-------------------------------------------------------------------------------------------------------------------------------------- -----------
tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_volume_from_snapshot[id-677863d1-3142-456d-b6ac-9924f667a7f4] 12.260
_____________________________________________________________________________________________________ summary _____________________________________________________________________________________________________
all: commands succeeded
congratulations :)
Now, running the whole test case provides:
all develop-inst-nodeps: /opt/stack/tempest
all installed: anyjson==0.3.3,Babel==1.3,boto==2.38.0,cffi==1.1.2,cliff==1.13.0,cmd2==0.6.8,cryptography==0.9.2,debtcollector==0.5.0,ecdsa==0.13,enum34==1.0.4,extras==0.0.3,fasteners==0.12.0,fixtures==1.3.1,functools32==3.2.3.post1,httplib2==0.9.1,idna==2.0,ipaddress==1.0.7,iso8601==0.1.10,jsonschema==2.5.1,linecache2==1.0.0,monotonic==0.2,msgpack-python==0.4.6,netaddr==0.7.15,netifaces==0.10.4,os-testr==0.2.0,oslo.concurrency==2.2.0,oslo.config==1.14.0,oslo.context==0.4.0,oslo.i18n==2.1.0,oslo.log==1.6.0,oslo.serialization==1.7.0,oslo.utils==1.8.0,paramiko==1.15.2,pbr==1.2.0,posix-ipc==1.0.0,prettytable==0.7.2,pyasn1==0.1.8,pycparser==2.14,pycrypto==2.6.1,pyOpenSSL==0.15.1,pyparsing==2.0.3,python-mimeparse==0.1.4,python-subunit==1.1.0,pytz==2015.4,PyYAML==3.11,retrying==1.3.3,six==1.9.0,stevedore==1.6.0,-e git+https://git.openstack.org/openstack/tempest.git@91ebdfac2f6021aa1bccceaf4274bb801cc0891b#egg=tempest-master,tempest-lib==0.6.1,testrepository==0.0.20,testscenarios==0.5.0,testtools==1.8.0,traceback2==1.4.0,unittest2==1.1.0,wheel==0.24.0,wrapt==1.10.5
all runtests: PYTHONHASHSEED='3462064401'
all runtests: commands[0] | find . -type f -name *.pyc -delete
all runtests: commands[1] | bash tools/pretty_tox.sh tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON
running testr
running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \
OS_TEST_LOCK_PATH=${OS_TEST_LOCK_PATH:-${TMPDIR:-'/tmp'}} \
${PYTHON:-python} -m subunit.run discover -t ${OS_TOP_LEVEL:-./} ${OS_TEST_PATH:-./tempest/test_discover} --list
running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \
OS_TEST_LOCK_PATH=${OS_TEST_LOCK_PATH:-${TMPDIR:-'/tmp'}} \
${PYTHON:-python} -m subunit.run discover -t ${OS_TOP_LEVEL:-./} ${OS_TEST_PATH:-./tempest/test_discover} --load-list /tmp/tmp4Nkte1
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshot_create_get_list_update_delete [4.813857s] ... ok
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshot_create_with_volume_in_use [24.101065s] ... ok
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshots_list_details_with_params [2.251190s] ... FAILED
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshots_list_with_params [2.229117s] ... FAILED
{0} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_volume_from_snapshot [2.232561s] ... FAILED
==============================
Failed 3 tests - output below:
==============================
[...]
The Quobyte driver runs with nas_secure_file_operations and nas_secure_file_permissions set to true, meaning it never uses root:root as UID:GID and always uses file permissions 660 . Some manual testing however showed that in the failing test runs image files with root:root 660 occurred, e.g.:
-rw-rw---- 1 root root 393216 Jul 9 10:07 volume-afc37512-6c8a-4962-a512-188a59d7720b.12edbd44-cbc3-47c3-82af-1eaa5a0e5339
-rw-r--r-- 1 ubuntu ubuntu 97 Jul 9 10:06 volume-afc37512-6c8a-4962-a512-188a59d7720b.info
Now the big question is: Why is this image file set to root:root ? I do not see how the driver might have done it and further i do not see why this occurs only if the tests are run as a group but not when the tests are run singleton. As far as i can see the test case does clean up after each test, so that should not be an issue, either.
Quobyte driver can be found e.g. here: https://github.com/openstack/cinder/blob/stable/kilo/cinder/volume/drivers/quobyte.py
An earlier bug report that did not get as deep can be found here: https://bugs.launchpad.net/cinder/+bug/1438603
Small update: We're still debugging this, currently logging and analyzing operations taking place during the tests at file system level, so far we've not find a decisive clue what's causing this.