[tempest] 2 Nova-related Tempest test failed

Bug #1556819 reported by Timur Nurlygayanov on 2016-03-14
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
High
Roman Podoliaka
8.0.x
High
Sergii Rizvan
9.x
High
Roman Podoliaka

Bug Description

Detailed bug description:
1 Nova-related Tempest test failed on configuration with detached components. It looks like the root of the issue in the configuration of services with detached components (when database and keystone service locates on separate hosts)
Bug was found on MOS 8.0, separate components don't work with MOS 9.0 for now (we need to update plugins).

Important: the issue reproduced on configuration without any plugins as well.

Steps To Reproduce:
1. Deploy OpenStack cluster with the following configuration: Neutron VLANs, enable DVR, enable SAHARA, enable MURANO, enable IRONIC, enable MONGO, HA, Cinder LVM, Swift. Add nodes with separate database and separate Keystone service.
2. Run full Tempest tests suite.

Expected Results:
All Tempest tests passed

Actual Results:
1 Nova-related test failed:
test_can_create_server_with_max_number_personality_files

Error:
Traceback (most recent call last):
  File "tempest/api/compute/servers/test_server_personality.py", line 137, in test_can_create_server_with_max_number_personality_files
    'sudo cat %s' % i['path']))
  File "/home/rally/.rally/tempest/for-deployment-6153d10c-0783-4032-be79-19b3b254adba/.venv/local/lib/python2.7/site-packages/testtools/testcase.py", line 362, in assertEqual
    self.assertThat(observed, matcher, message)
  File "/home/rally/.rally/tempest/for-deployment-6153d10c-0783-4032-be79-19b3b254adba/.venv/local/lib/python2.7/site-packages/testtools/testcase.py", line 447, in assertThat
    raise mismatch_error
testtools.matchers._impl.MismatchError: 'This is a test file.' != u''

Reproducibility:
100 %

Workaround:
We don't know the workaround yet.

Impact:
All users who will install separate components with MOS 8.0 are affected.
We will target the issue on MOs 9.0 as well to not miss the fix for 9.0

tags: added: area-nova
tags: added: tempest
summary: - [tempest] [separate components] 3 Nova-related Tempest tests failed on
+ [tempest] [separate components] 1 Nova-related Tempest test failed on
configuration with detached components
description: updated

It is effecting two scenarios with same error
test_create_server_with_personality
test_can_create_server_with_max_number_personality_files

Error Message

test failed

Stacktrace

Traceback (most recent call last):
  File "tempest/api/compute/servers/test_server_personality.py", line 137, in test_can_create_server_with_max_number_personality_files
    'sudo cat %s' % i['path']))
  File "tempest/common/utils/linux/remote_client.py", line 44, in exec_command
    return self.ssh_client.exec_command(cmd)
  File "tempest/lib/common/ssh.py", line 168, in exec_command
    stderr=err_data, stdout=out_data)
tempest.lib.exceptions.SSHExecCommandFailed: Command 'set -eu -o pipefail; PATH=$PATH:/sbin; sudo cat /etc/test0.txt', exit status: 1, stderr:
cat: can't open '/etc/test0.txt': No such file or directory

summary: - [tempest] [separate components] 1 Nova-related Tempest test failed on
+ [tempest] [separate components] 2 Nova-related Tempest tests failed on
configuration with detached components

Timur, the error Sergii mentioned in the comment above has nothing to do with the fact you have separate nodes for keystone and database: it says that file injection did not work for some reason.

Are you sure the very same test pass in your usual runs? (without separated components) For the case when ephemeral disks are stored in Ceph, it's expected that file injection won't work.

Hi Roman,

Yes, it looks like we have two different bugs.
The following test failed for default configuration without separate components:

test_can_create_server_with_max_number_personality_files

description: updated
summary: - [tempest] [separate components] 2 Nova-related Tempest tests failed on
+ [tempest] [separate components] 1 Nova-related Tempest tests failed on
configuration with detached components
summary: - [tempest] [separate components] 1 Nova-related Tempest tests failed on
- configuration with detached components
+ [tempest] 1 Nova-related Tempest test failed
description: updated

The latest results or Tempest tests execution, two tests failed:

test_can_create_server_with_max_number_personality_files
test_create_server_with_personality

summary: - [tempest] 1 Nova-related Tempest test failed
+ [tempest] 2 Nova-related Tempest test failed
Roman Podoliaka (rpodolyaka) wrote :

Timur, these tests imply using of file injection, which we *explicitly* disable when Ceph is used for ephemerals (because it's not supported):

http://git.openstack.org/cgit/openstack/fuel-library/tree/deployment/puppet/ceph/manifests/ephemeral.pp#n11
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L152-L156

E.g. you can see that tests using file injection are not enabled for Ceph deployments in OSTF:

https://github.com/openstack/fuel-ostf/blob/37c5d6113408a29cabe0f416fe99cf20e2bca318/fuel_health/tests/smoke/test_nova_create_instance_with_connectivity.py#L295-L308

I suggest we tweak our tempest configuration to skip these two tests in deployments, when Ceph is used for ephemeral disks in Nova.

tags: added: area-qa
removed: area-nova

Roman, thank you!
We are going to add these tests to the list of skipped tests for configurations with Ceph.

Oleksiy Butenko (obutenko) wrote :

skips decorators was added to ci-jobs

Sergii Turivnyi (sturivnyi) wrote :

Detailed bug description:
ISO: http://paste.openstack.org/show/492962/
Configuration: [9.0][MOSQA] Tempest 9.0 (VLAN_CINDER-TEMPEST,HA,Cinder LVM,Swift(api))

Steps to reproduce:
Deploy ISO in configuration see (Detailed bug description)

Expected results:
Test is passed

Actual result:
Test is Failed

Reproducibility:
See attachment

Workaround:
---

Impact:
---

Description of the environment:
See (Detailed bug description)

Additional information:
Error Message

test failed

Stacktrace

Traceback (most recent call last):
  File "tempest/api/compute/servers/test_server_personality.py", line 137, in test_can_create_server_with_max_number_personality_files
    'sudo cat %s' % i['path']))
  File "tempest/common/utils/linux/remote_client.py", line 44, in exec_command
    return self.ssh_client.exec_command(cmd)
  File "tempest/lib/common/ssh.py", line 168, in exec_command
    stderr=err_data, stdout=out_data)
tempest.lib.exceptions.SSHExecCommandFailed: Command 'set -eu -o pipefail; PATH=$PATH:/sbin; sudo cat /etc/test0.txt', exit status: 1, stderr:
cat: can't open '/etc/test0.txt': No such file or directory

Sergii Turivnyi (sturivnyi) wrote :

Reproduced on customer environment with MOS 8.0, baremetal lab with 10+ hardware servers, Ceph+RadosGW backend, HA.

Oleksiy Butenko (obutenko) wrote :

test_can_create_server_with_max_number_personality_files,
test_create_server_with_personality - failed on evn with LVM
MOS ISO 289

Oleksiy Butenko (obutenko) wrote :

contact me for access to the environment where tests are failing.

Oleksiy Butenko (obutenko) wrote :

Can reproduce on Lab:
Controller+Mongo, Compute+LVM, Ironic

2016-05-11 14:53:13.959 13173 ERROR nova.scheduler.utils [req-2e695f62-8fc4-4993-8f02-895fb6a35f4e fa10c65c31e3438592813bcda8e0901c 56c4235eb070458b98124f05d30dc6ef - - -] [instance: 18b96cbe-be34-43f0-a5c0-8486727064c7] Error from last host: node-6.test.domain.local (node node-6.test.domain.local): [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1926, in _do_build_and_run_instance\n filter_properties)\n', u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2116, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u'RescheduledException: Build of instance 18b96cbe-be34-43f0-a5c0-8486727064c7 was re-scheduled: partition search unsupported with nbd\n']
2016-05-11 14:53:14.075 13172 DEBUG oslo_messaging._drivers.amqpdriver [req-2e695f62-8fc4-4993-8f02-895fb6a35f4e fa10c65c31e3438592813bcda8e0901c 56c4235eb070458b98124f05d30dc6ef - - -] REPLY msg_id: 7b5c5ab3686c4924addebfb136c39293 size: 146 reply queue: reply_de8bd1844e2141aba910eac76d7c721b time elapsed: 0.193683430996s _send_reply /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:96
2016-05-11 14:53:14.125 13173 DEBUG oslo_messaging._drivers.amqpdriver [req-2e695f62-8fc4-4993-8f02-895fb6a35f4e fa10c65c31e3438592813bcda8e0901c 56c4235eb070458b98124f05d30dc6ef - - -] CALL msg_id: b1aec354709b4c008a26a8600b0e785c size: 5682 exchange: nova topic: scheduler _send /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:496
2016-05-11 14:53:14.210 13173 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: b1aec354709b4c008a26a8600b0e785c size: 1077 __call__ /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:339
2016-05-11 14:53:14.211 13173 WARNING nova.scheduler.utils [req-2e695f62-8fc4-4993-8f02-895fb6a35f4e fa10c65c31e3438592813bcda8e0901c 56c4235eb070458b98124f05d30dc6ef - - -] Failed to compute_task_build_instances: No valid host was found. There are not enough hosts available.
Traceback (most recent call last):

  File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 150, in inner
    return func(*args, **kwargs)

  File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 104, in select_destinations
    dests = self.driver.select_destinations(ctxt, spec_obj)

  File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 74, in select_destinations
    raise exception.NoValidHost(reason=reason)

NoValidHost: No valid host was found. There are not enough hosts available.

2016-05-11 14:53:14.212 13173 WARNING nova.scheduler.utils [req-2e695f62-8fc4-4993-8f02-895fb6a35f4e fa10c65c31e3438592813bcda8e0901c 56c4235eb070458b98124f05d30dc6ef - - -] [instance: 18b96cbe-be34-43f0-a5c0-8486727064c7] Setting instance to ERROR state.

tags: removed: area-qa
tags: added: area-nova
Roman Podoliaka (rpodolyaka) wrote :

Folks, as I already stated in the comment https://bugs.launchpad.net/mos/+bug/1556819/comments/5
this test case involves usage of file injection, which *can* be skipped depending on the ephemeral storage used:

https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L3086-L3093

File injection is explicitly disabled in our Puppet manifests for RBD ephemerals:

http://git.openstack.org/cgit/openstack/fuel-library/tree/deployment/puppet/openstack_tasks/manifests/roles/compute.pp#n352

This *must not* pass when ephemerals are stored in Ceph.

Roman Podoliaka (rpodolyaka) wrote :

If it's reproduced without Ceph, let's have a live debugging session on such environment.

tags: added: area-qa
removed: area-nova

QA environment is provided, debugging is in progress.

Roman Podoliaka (rpodolyaka) wrote :

I checked Oleksii's environment and it looks interesting: file injection actually works and test_can_create_server_with_max_number_personality_files is the only failing test (out of 4 personality tests).

Tempest log shows that out of 50 injected files (default limit is 50), 40 were injected successfully:

https://github.com/openstack/tempest/blob/198e5b4b871c3d09c20afb56dca9637a8cf86ac8/tempest/api/compute/servers/test_server_personality.py#L109-L141
http://paste.openstack.org/show/496936/

the 41th (test40.txt) was empty while it should have contained 'This is a test file.'.

At the same time there are the following errors in dmesg:

http://paste.openstack.org/show/496933/

I'm wondering if it can be a fallout of the recent update of TestVM image.

MOS Linux, could you please assist with debugging this further? The good thing is that it's reproduced every time, so you simply need a 9.0 environment with the default settings for ephemeral storage.

tags: added: area-linux
removed: area-qa
Albert Syriy (asyriy) wrote :

Actually the image is not a cause of the issue, because the bug have been easily reproduced with original cirros-0.3.4 image too.
Feather investigation showed that image under test unmounted during writing the testing files on it, and as a result the file system is in corrupted state. Looks like a race condition between writing and unmounting procedures in the test.

Alexander Gubanov (ogubanov) wrote :

I've been digging into this issue
- with other images (official cirros release)
- with downgraded qemu packages

and got the same situation - time by time not all files were injected.

I guess that root cause is in nbd module.
On compute node /var/log/sudo.log or /var/log/nova/nova-compute.log (with debug=true) have sequence of commands

qemu-nbd -c /dev/nbd12 /var/lib/nova/instances/16e7efb2-ccbb-411d-83f4-c629dffcc152/disk
mount /dev/nbd12p1 /tmp/openstack-vfs-localfsXC7IqB
readlink -nm /tmp/openstack-vfs-localfsXC7IqB/etc
readlink -e /tmp/openstack-vfs-localfsXC7IqB/etc
readlink -nm /tmp/openstack-vfs-localfsXC7IqB/etc/test0.txt
tee /tmp/openstack-vfs-localfsXC7IqB/etc/test0.txt
readlink -nm /tmp/openstack-vfs-localfsXC7IqB/etc
readlink -e /tmp/openstack-vfs-localfsXC7IqB/etc
...
blockdev --flushbufs /dev/nbd12
umount /dev/nbd12p1
qemu-nbd -d /dev/nbd12

with return code 0 for all, but at the same time we get errors in /var/log/syslog - attempts to send on closed socket.
So socket is closing earlier than "blockdev" and "umoutn" flush buffers into the disk.

But, now this test is 100% passed on MOS 9.0 (build 376) and even old ISOs (where it failed before) with different images.
So as we can't reproduce this issue - I moved to Incomplite. Please, reopen ticket if issue is reproduced again.

Alex Ermolov (aermolov) on 2016-05-23
tags: added: non-release
Alexander Gubanov (ogubanov) wrote :

I found another strange behaviour with file_injection
details http://pastebin.com/0aiEQPTy

Roman Podoliaka (rpodolyaka) wrote :

^ the fix is merged to master (10.0), a backport to 9.0 is proposed here https://review.openstack.org/#/c/322434/

Roman Podoliaka (rpodolyaka) wrote :

Removing non-release tag - this a problem with nbd file injection, not with Tempest tests

tags: removed: non-release
tags: added: release-notes
Roman Podoliaka (rpodolyaka) wrote :

Adding release-notes tag - we should document the fact, that we now explicitly disable file injection and force instances to have a config drive.

Roman Podoliaka (rpodolyaka) wrote :

https://review.openstack.org/#/c/322434/ is merged to stable/mitaka

Sergii Rizvan (srizvan) wrote :

It's pretty dangerous to include the fix to published releases, because it may cause the issues in another components (for example: https://bugs.launchpad.net/mos/+bug/1587919). That's why we are going to close the bug for 8.0 as Won't Fix.

tags: added: wontfix-risky
Alexander Gubanov (ogubanov) wrote :

I've verified file_injection on MOS 9.0 (build 448) - works!
Proof/details: http://pastebin.com/DPZGnG2x

Related fix proposed to branch: master
Change author: Evgeny Konstantinov <email address hidden>
Review: https://review.fuel-infra.org/22320

Reviewed: https://review.fuel-infra.org/22320
Submitter: Evgeny Konstantinov <email address hidden>
Branch: master

Commit: 2845bca5e1ccec1e894672bf3fc1c07493619608
Author: Evgeny Konstantinov <email address hidden>
Date: Wed Jun 22 10:30:56 2016

Add Nova resolved issues 9.0

Change-Id: Ia8c83f6dfffa2df143cf5c494e41e8676aff8028
Related-Bug: #1556819
Related-Bug: #1471172

tags: added: release-notes-done
removed: release-notes
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers