Tempest test test_create_server_with_personality is failed

Bug #1590420 reported by Sofiia Andriichenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
Fix Released
High
Oleg Bondarev
9.x
Fix Released
High
Oleg Bondarev

Bug Description

Configuration:
    ISO: 9.0 mos iso #452
Settings:
Compute - QEMU.
Network - Neutron with VLAN segmentation.
Storage Backends - LVM
Additional services - Install Ironic, Install Sahara

In tab Settings->Compute check Nova quotas
In tab Settings->OpenStack Services check enable Install Ceilometer and Aodh
In tab Networks->Other check enable Neutron DVR

Nodes: controller, compute, ironic,cinder, Telemetry - MongoDB

Steps to reproduce:
    1. Deploy ISO in configuration see (Detailed bug description)
    2. Navigate to controller node
    3. Install git (use apt-get install git)
    4. Clone script to deploy rally + tempest
       # git clone https://github.com/obutenko/mos-rally-verify.git
    5. Navigate to the https://github.com/obutenko/mos-rally-verify
    6. Execute necessary steps to deploy tempest
    7. Tun test in debug mode
        #rally --debug verify start --regex tempest.api.compute.servers.test_server_personality.ServerPersonalityTestJSON.test_create_server_with_personality

Expected results:
Test is passed

Actual result:
Test is Failed
(see comments)

Reproducibility:
See attachment

Workaround:
---

Impact:
---

Additional information:
Traceback (most recent call last):

  File "/home/rally/.rally/tempest/for-deployment-ccd279e2-e373-430c-826b-8bd97fa9e419/tempest/api/compute/servers/test_server_personality.py", line 72, in test_create_server_with_personality

    'sudo cat %s' % file_path))

  File "/home/rally/.rally/tempest/for-deployment-ccd279e2-e373-430c-826b-8bd97fa9e419/tempest/common/utils/linux/remote_client.py", line 44, in exec_command

    return self.ssh_client.exec_command(cmd)

  File "/home/rally/.rally/tempest/for-deployment-ccd279e2-e373-430c-826b-8bd97fa9e419/tempest/lib/common/ssh.py", line 118, in exec_command

    ssh = self._get_ssh_connection()

  File "/home/rally/.rally/tempest/for-deployment-ccd279e2-e373-430c-826b-8bd97fa9e419/tempest/lib/common/ssh.py", line 88, in _get_ssh_connection

    password=self.password)

tempest.lib.exceptions.SSHTimeout: Connection to the 10.109.4.164 via SSH timed out.

User: cirros, Password: Y6!w~%qGn@0vpEV

Revision history for this message
Sofiia Andriichenko (sandriichenko) wrote :
Revision history for this message
Oleksiy Butenko (obutenko) wrote :

test failed time-to-time on ci and can't reproduce manual.

Changed in mos:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Oleksiy Butenko (obutenko)
milestone: none → 10.0
milestone: 10.0 → 9.0
description: updated
Revision history for this message
Sofiia Andriichenko (sandriichenko) wrote :
Changed in mos:
importance: Medium → High
Revision history for this message
Sergey Shevorakov (sshevorakov) wrote :

Moving to High since it fails one Tempest test case.

Revision history for this message
Oleksiy Butenko (obutenko) wrote :

cat /var/log/nova/nova-compute.log |grep ERROR
http://paste.openstack.org/show/509206/

cat /var/log/glance-all.log |grep ERROR
http://paste.openstack.org/show/509209/

cat /var/log/swift-all.log |grep ERROR
http://paste.openstack.org/show/509208/

Changed in mos:
assignee: Oleksiy Butenko (obutenko) → MOS Glance (mos-glance)
tags: added: area-glance
Changed in mos:
assignee: MOS Glance (mos-glance) → Oleksiy Butenko (obutenko)
Revision history for this message
Oleksiy Butenko (obutenko) wrote :

imho it's bug in ci-job. not in glance or swift
I need more time for reproduce and debug this issue

description: updated
Changed in mos:
assignee: Oleksiy Butenko (obutenko) → MOS Nova (mos-nova)
Revision history for this message
Dina Belova (dbelova) wrote :

After a conversation with Roman P. I'm moving this bug to 9.0-updates (due to the time that most possibly team will need to spend on the research and fix, we won't be able to land the fix in 9.0)

Changed in mos:
milestone: 9.0 → 9.0-updates
Revision history for this message
Alexander Gubanov (ogubanov) wrote :

I've investigated this bug and found that test fails (time by time, not always!) when we run bunch of test, like:

rally --debug verify start --regex tempest.api.compute.servers.test_server_personality.ServerPersonalityTestJSON
....
{0} tempest.api.compute.servers.test_server_personality.ServerPersonalityTestJSON.test_can_create_server_with_max_number_personality_files [92.704434s] ... ok
{0} tempest.api.compute.servers.test_server_personality.ServerPersonalityTestJSON.test_create_server_with_personality [21.695728s] ... FAILED
{0} tempest.api.compute.servers.test_server_personality.ServerPersonalityTestJSON.test_personality_files_exceed_limit [1.174876s] ... ok
{0} tempest.api.compute.servers.test_server_personality.ServerPersonalityTestJSON.test_rebuild_server_with_personality [54.739215s] ... ok

If run single test

rally --debug verify start --regex tempest.api.compute.servers.test_server_personality.ServerPersonalityTestJSON.test_create_server_with_personality

it will pass.

Note: it's Neutron DVR con
root@node-3:~# grep '^router_distributed' /etc/neutron/neutron.conf
router_distributed = True

So, I added breakpoint right after line 71
https://github.com/openstack/tempest/blob/master/tempest/api/compute/servers/test_server_personality.py#L71
ran tests and found that instance is running and I can connect to instance via IP in namespace, but floating IP is not available by icmp.
Also, from instance all networks (internet, router, controller) are available.

Details http://pastebin.com/FMMBa344

Revision history for this message
Alexander Gubanov (ogubanov) wrote :

2 more notes:

- for reproducing this bug you should deploy Openstack with enabled "DVR" and "Nova quotas"

- I found difference between tempest code on CI (where this test fails time by time) and upstream tempest code,
because MOS QA CI uses custom tempest commit (I don't know why).
Here is details from CI tempest jobs:
git clone https://git.openstack.org/openstack/tempest && cd tempest && git checkout 63cb9a3718f394c9da8e0cc04b170ca2a8196ec2

Revision history for this message
Alexander Gubanov (ogubanov) wrote :

Bug also reproduced with latest upstream tempest code, so you can ignore my previous comment.

Revision history for this message
Oleg Bondarev (obondarev) wrote :

From logs I can see that dvr l3 agent on compute node does not properly handle the case when floating ip is moved from one VM to another when VMs are located on the same compute. This might also happen with non dvr routers/floating ips.

Revision history for this message
Oleg Bondarev (obondarev) wrote :

To clarify: 'moved' means associated with new VM without disassociation from old VM. API allows it so l3 agent should properly handle such a case.

Workaround: disassociate floating IP before association to another VM.

Revision history for this message
Oleg Bondarev (obondarev) wrote :

Filed upstream https://bugs.launchpad.net/neutron/+bug/1599089
Fix upstream: https://review.openstack.org/#/c/337591/ - will be backported to stable/mitaka

Revision history for this message
Oleg Bondarev (obondarev) wrote :
Revision history for this message
Sofiia Andriichenko (sandriichenko) wrote :

Reproduce for 9.1 snapshot #140

Settings:
Compute - QEMU.
Network - Neutron with VLAN segmentation.
Storage Backends - LVM
Additional services - Install Ironic, Install Sahara

In tab Settings->Compute check Nova quotas
In tab Settings->OpenStack Services check enable Install Ceilometer and Aodh
In tab Networks->Other check enable Neutron DVR
In tab Settings->Security check enable TLS for OpenStack public endpoints, HTTPS for Horizon

Nodes: controller, compute, ironic,cinder, Telemetry - MongoDB

Trace:

Traceback (most recent call last):

  File "/home/rally/.rally/tempest/for-deployment-246abb07-bd73-4ac4-91f8-8c6031e51e84/tempest/api/compute/servers/test_server_personality.py", line 74, in test_create_server_with_personality

    'sudo cat %s' % file_path))

  File "/home/rally/.rally/tempest/for-deployment-246abb07-bd73-4ac4-91f8-8c6031e51e84/tempest/common/utils/linux/remote_client.py", line 55, in wrapper

    six.reraise(*original_exception)

  File "/home/rally/.rally/tempest/for-deployment-246abb07-bd73-4ac4-91f8-8c6031e51e84/tempest/common/utils/linux/remote_client.py", line 36, in wrapper

    return function(self, *args, **kwargs)

  File "/home/rally/.rally/tempest/for-deployment-246abb07-bd73-4ac4-91f8-8c6031e51e84/tempest/common/utils/linux/remote_client.py", line 92, in exec_command

    return self.ssh_client.exec_command(cmd)

  File "/home/rally/.rally/tempest/for-deployment-246abb07-bd73-4ac4-91f8-8c6031e51e84/tempest/lib/common/ssh.py", line 118, in exec_command

    ssh = self._get_ssh_connection()

  File "/home/rally/.rally/tempest/for-deployment-246abb07-bd73-4ac4-91f8-8c6031e51e84/tempest/lib/common/ssh.py", line 88, in _get_ssh_connection

    password=self.password)

tempest.lib.exceptions.SSHTimeout: Connection to the 10.109.4.170 via SSH timed out.

User: cirros, Password: Z5~+kIl5CwVP%lx

snapshot https://drive.google.com/a/mirantis.com/file/d/0BxPLDs6wcpbDblMtekMwbldXVzQ/view?usp=sharing

Revision history for this message
Sofiia Andriichenko (sandriichenko) wrote :

This test is failed on another configuration with other error
9.1 snapshot #140

Settings:
Storage Backends - Ceph RBD for volumes (Cinder), Ceph RBD for ephemeral volumes (Nova), Ceph RBD for images (Glance), Ceph RadosGW for objects (Swift API)
Additional services - Install Sahara

In tab Settings->Compute check Nova quotas
In tab Settings->OpenStack Services check enable Install Ceilometer and Aodh
In tab Networks->Other check enable Neutron DVR
In tab Settings->Security check enable TLS for OpenStack public endpoints, HTTPS for Horizon

Nodes: controller, compute, Ceph, Telemetry - MongoDB

Trace:

Traceback (most recent call last):

  File "/home/rally/.rally/tempest/for-deployment-8e099e9b-aee7-4436-8331-d3d4b229eb3b/tempest/api/compute/servers/test_server_personality.py", line 74, in test_create_server_with_personality

    'sudo cat %s' % file_path))

  File "/home/rally/.rally/tempest/for-deployment-8e099e9b-aee7-4436-8331-d3d4b229eb3b/tempest/common/utils/linux/remote_client.py", line 36, in wrapper

    return function(self, *args, **kwargs)

  File "/home/rally/.rally/tempest/for-deployment-8e099e9b-aee7-4436-8331-d3d4b229eb3b/tempest/common/utils/linux/remote_client.py", line 92, in exec_command

    return self.ssh_client.exec_command(cmd)

  File "/home/rally/.rally/tempest/for-deployment-8e099e9b-aee7-4436-8331-d3d4b229eb3b/tempest/lib/common/ssh.py", line 168, in exec_command

    stderr=err_data, stdout=out_data)

tempest.lib.exceptions.SSHExecCommandFailed: Command 'set -eu -o pipefail; PATH=$PATH:/sbin; sudo cat /test.txt', exit status: 1, stderr:

cat: can't open '/test.txt': No such file or directory

stdout:

snapshot https://drive.google.com/a/mirantis.com/file/d/0BxPLDs6wcpbDTFBMTkZsdFFkcUE/view?usp=sharing

Revision history for this message
Oleg Bondarev (obondarev) wrote :

Snapshot shows that the fix is not working for some reason, going to revert the env and debug.

Revision history for this message
Oleg Bondarev (obondarev) wrote :

On a reverted env test is passing and logs show that the fix is working. Please reopen if occurs again.

Revision history for this message
Oleg Bondarev (obondarev) wrote :

failure from comment 14 might have same root cause as described in comment #4 here: https://bugs.launchpad.net/mos/+bug/1611772

tags: added: on-verification
Revision history for this message
Alexander Gubanov (ogubanov) wrote :
tags: removed: on-verification
Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.