[OSTF] OSTF tests failed if compute node on which it should be run is offline but there is another one compute node

Bug #1323252 reported by Andrey Sledzinskiy on 2014-05-26
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Medium
Unassigned
4.1.x
Medium
Nastya Urlapova
5.0.x
Medium
Nastya Urlapova

Bug Description

Reproduced on {"build_id": "2014-05-25_23-01-31", "mirantis": "yes", "build_number": "22", "ostf_sha": "1f020d69acbf50be00c12c29564f65440971bafe", "nailgun_sha": "bd09f89ef56176f64ad5decd4128933c96cb20f4", "production": "docker", "api": "1.0", "fuelmain_sha": "db2d153e62cb2b3034d33359d7e3db9d4742c811", "astute_sha": "a7eac46348dc77fc2723c6fcc3dbc66cc1a83152", "release": "5.0", "fuellib_sha": "b9985e42159187853edec82c406fdbc38dc5a6d0"}

http://jenkins-product.srt.mirantis.net:8080/view/5.0_swarm/job/5.0_fuelmain.system_test.centos.thread_5/5/testReport/junit/%28root%29/ceph_ha_restart/ceph_ha_restart/

Traceback (most recent call last):
  File "/home/jenkins/venv-nailgun-tests/local/lib/python2.7/site-packages/proboscis/case.py", line 296, in testng_method_mistake_capture_func
    compatability.capture_type_error(s_func)
  File "/home/jenkins/venv-nailgun-tests/local/lib/python2.7/site-packages/proboscis/compatability/exceptions_2_6.py", line 27, in capture_type_error
    func()
  File "/home/jenkins/venv-nailgun-tests/local/lib/python2.7/site-packages/proboscis/case.py", line 350, in func
    func(test_case.state.get_state())
  File "/home/jenkins/workspace/5.0_fuelmain.system_test.centos.thread_5/fuelweb_test/helpers/decorators.py", line 49, in wrapper
    return func(*args, **kwagrs)
  File "/home/jenkins/workspace/5.0_fuelmain.system_test.centos.thread_5/fuelweb_test/tests/tests_strength/test_restart.py", line 144, in ceph_ha_restart
    self.fuel_web.run_ostf(cluster_id=cluster_id)
  File "/home/jenkins/workspace/5.0_fuelmain.system_test.centos.thread_5/fuelweb_test/__init__.py", line 48, in wrapped
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/5.0_fuelmain.system_test.centos.thread_5/fuelweb_test/models/fuel_web_client.py", line 501, in run_ostf
    failed_test_name=failed_test_name)
  File "/home/jenkins/workspace/5.0_fuelmain.system_test.centos.thread_5/fuelweb_test/__init__.py", line 48, in wrapped
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/5.0_fuelmain.system_test.centos.thread_5/fuelweb_test/models/fuel_web_client.py", line 174, in assert_ostf_run
    failed_tests_res))
AssertionError: Failed tests, fails: 3 should fail: 0 failed tests name: [{u'Check internet connectivity from a compute': u"Time limit exceeded while waiting for 'ping' command to finish. Please refer to OpenStack logs for more details."}, {u'Check DNS resolution on compute node': u'Instance is not reachable by IP. Please refer to OpenStack logs for more details.'}, {u'Create volume and attach it to instance': u'Timed out waiting to become ACTIVE Please refer to OpenStack logs for more details.'}]

Tests failed because we destroy compute node in test and OSTF try to run tests on destroyed compute node - this node's id is 1.
It should check whether node is online and if it's offline then run test on another compute node.

Logs are attached

Changed in fuel:
status: New → Confirmed
importance: High → Medium
Tatyanka (tatyana-leontovich) wrote :

There several issues:
1. We should expect 1 fail in the bug(from nova-manage output tests, according we has one 1 compute destroyes - so we will have XXX for metwork and compute service)

2. Verify compute node connectivity only from online nodes (this should be fixed in ostf)

3. Fix setting for volume backend (We set in True both cinder lvm and ceph) Iw we leave only ceph - delete conder node
If we decide leave cinder node - we should expect fail in volume test(ostf) according we destroy node with cinder

Reviewed: https://review.openstack.org/95732
Committed: https://git.openstack.org/cgit/stackforge/fuel-ostf/commit/?id=ab76323cdb6f9c43f9e97177aa920c5452029eef
Submitter: Jenkins
Branch: master

commit ab76323cdb6f9c43f9e97177aa920c5452029eef
Author: Tatyana Leontovich <email address hidden>
Date: Tue May 27 14:37:50 2014 +0300

    5.1 Permit infrastructure test only if node is online

    Permit sanity infrastructure tests only if node is online

    Change-Id: I174120b313fd907e2c41c85dc3cf57a2ea0f35bd
    Related-Bug: #1323252

Reviewed: https://review.openstack.org/95740
Committed: https://git.openstack.org/cgit/stackforge/fuel-main/commit/?id=617bb6e09097d6f252b17256234a540b032b9fbc
Submitter: Jenkins
Branch: master

commit 617bb6e09097d6f252b17256234a540b032b9fbc
Author: Tatyana Leontovich <email address hidden>
Date: Tue May 27 15:05:38 2014 +0300

    5.1 Fix ceph restart tests

    Disable volume_lvm if ceph is using for volumes
    Delete cinder role if cinder is not use
    Change expected ostf test fail form 0 to 1 after compute node's
    destroyed

    Change-Id: I68aafc6bc69ed12b4c4c5fd8dd8d749da1cd0369
    Related-Bug: #1323252

Changed in fuel:
status: Confirmed → Fix Committed
Changed in fuel:
status: Fix Committed → Fix Released

Reviewed: https://review.openstack.org/97090
Committed: https://git.openstack.org/cgit/stackforge/fuel-main/commit/?id=8bbf072f19957eed45a41881f424253faa2423fb
Submitter: Jenkins
Branch: stable/4.1

commit 8bbf072f19957eed45a41881f424253faa2423fb
Author: Tatyana Leontovich <email address hidden>
Date: Tue May 27 15:05:38 2014 +0300

    4.1 Fix ceph restart tests

    Disable volume_lvm if ceph is using for volumes
    Delete cinder role if cinder is not use
    Change expected ostf test fail form 0 to 1 after compute node's
    destroyed

    Change-Id: I68aafc6bc69ed12b4c4c5fd8dd8d749da1cd0369
    Related-Bug: #1323252
    Closes-Bug: #1324976

Reviewed: https://review.openstack.org/97088
Committed: https://git.openstack.org/cgit/stackforge/fuel-main/commit/?id=e88e94435bf608683b854b8af6f68acc185cd6b1
Submitter: Jenkins
Branch: stable/5.0

commit e88e94435bf608683b854b8af6f68acc185cd6b1
Author: Tatyana Leontovich <email address hidden>
Date: Tue May 27 15:05:38 2014 +0300

    5.0 Fix ceph restart tests

    Disable volume_lvm if ceph is using for volumes
    Delete cinder role if cinder is not use
    Change expected ostf test fail form 0 to 1 after compute node's
    destroyed

    Change-Id: I68aafc6bc69ed12b4c4c5fd8dd8d749da1cd0369
    Related-Bug: #1323252

Reviewed: https://review.openstack.org/97190
Committed: https://git.openstack.org/cgit/stackforge/fuel-ostf/commit/?id=4b9ecbae1a0a4b3f69eb0042a70a9bf8e9e715a5
Submitter: Jenkins
Branch: stable/4.1

commit 4b9ecbae1a0a4b3f69eb0042a70a9bf8e9e715a5
Author: Tatyana Leontovich <email address hidden>
Date: Tue May 27 14:37:50 2014 +0300

    4.1.1 Permit infa test only for online nodes

    Permit sanity infrastructure tests only if node is online

    Change-Id: I174120b313fd907e2c41c85dc3cf57a2ea0f35bd
    Related-Bug: #1323252

Problem still appears in 4.1 and 5.0 tests:
http://jenkins-product.srt.mirantis.net:8080/view/4.1_swarm/job/4.1_fuelmain.system_test.centos.thread_5/98/testReport/junit/%28root%29/ceph_ha_restart/ceph_ha_restart/
If run ostf after compute destroying - it fails with checking connectivity on compute. But all tests passes after second re-run
Logs are attached

Changed in fuel:
status: Fix Released → Confirmed
Changed in fuel:
status: Confirmed → Incomplete
Tatyanka (tatyana-leontovich) wrote :

in ostf log we can see that 10.108.0.4 is online but(actually we get this data from nailgun, and nailgun says that both computes are online 2014-06-06 04:28:50 INFO (config) Online compute ips is [u'10.108.0.4', u'10.108.0.7'] ) we fail to ssh on it with Timeout error. (SSHTimeout: Connection to the 10.108.0.4 via SSH timed out.)I verify nailgun logs, and we can see that compute with ip 10.108.0.4 is actually online and agent from this node send data with status up and etc. It is strange but seems we should to check why nailgun api says that node is online after destroy. And may be create separate issue(may be we do not fully destroy node in sys tests)

And in app/log from nailgun we can see that cthis compute is reallu online
 "role": "compute",
                    "vlan_splinters": "disabled",
                    "online": true,
                    "keystone": {
                        "db_password": "SKS34eSw",
                        "admin_token": "U9Y6ODi8"
                    },

2014-06-06 04:27:12.494 DEBUG [7f50fabfd700] (logger) Request PUT /api/nodes/agent/ from 10.108.0.4:49567 {"manufacturer":"QEMU","os_platform":"centos","mac":"64:19:69:F9:12:4B","is_agent":true,"agent_checksum":"18a178639b9814bfeeefca8cfbd9a47b62d20f26","platform_name":"Standard PC (i440FX + PIIX, 1996)","meta":{"disks":[{"model":null,"disk":"disk/by-path/pci-0000:00:0a.0-virtio-pci-virtio7","removable":"0","size":53687091200,"extra":[],"name":"vdc"},{"model":null,"disk":"disk/by-path/pci-0000:00:09.0-virtio-pci-virtio6","removable":"0","size":53687091200,"extra":[],"name":"vdb"},{"model":null,"disk":"disk/by-path/pci-0000:00:08.0-virtio-pci-virtio5","removable":"0","size":53687091200,"extra":[],"name":"vda"}],"memory":{"devices":[{"type":"RAM","size":1610612736}],"maximum_capacity":1610612736,"slots":1,"total":1610612736},"cpu":{"total":1,"real":0,"spec":[{"model":"Intel Xeon E312xx (Sandy Bridge)","frequency":3500}]},"system":{"manufacturer":"QEMU","version":"pc-i440fx-trusty","fqdn":"node-2","product":"Standard PC (i440FX + PIIX, 1996)"},"interfaces":[{"netmask":"255.255.255.0","mac":"64:64:BF:9C:85:C4","state":"up","current_speed":null,"name":"eth4","ip":"10.108.4.3"},{"mac":"64:A8:91:64:84:BA","state":"up","current_speed":null,"name":"eth3"},{"netmask":"255.255.255.0","mac":"64:D4:07:13:ED:B6","state":"up","current_speed":null,"name":"eth2","ip":"10.108.2.4"},{"mac":"64:DD:58:34:20:B6","netmask":"255.255.255.0","state":"up","current_speed":null,"name":"eth1","ip":"10.108.1.4"},{"netmask":"255.255.255.0","mac":"64:19:69:F9:12:4B","state":"up","current_speed":null,"name":"eth0","ip":"10.108.0.4"}]},"ip":"10.108.0.4"}

So seems that issue is in out system test and we need add wait untill destroyed nodes bacome offline in nailgun and then only run ostf

Tatyanka (tatyana-leontovich) wrote :

please, confirm my assumption and if so, create a separate issue for system test and back to fixed this ones)

Changed in fuel:
status: Incomplete → Fix Released

Reviewed: https://review.openstack.org/104594
Committed: https://git.openstack.org/cgit/stackforge/fuel-ostf/commit/?id=0105f1444f3e794bbc1131732de22a1318e4e845
Submitter: Jenkins
Branch: stable/5.0

commit 0105f1444f3e794bbc1131732de22a1318e4e845
Author: Tatyana Leontovich <email address hidden>
Date: Tue May 27 14:37:50 2014 +0300

    Permit infrastructure test only if node is online

    Permit sanity infrastructure tests only if node is online

    Change-Id: I174120b313fd907e2c41c85dc3cf57a2ea0f35bd
    Related-Bug: #1323252
    (cherry picked from commit ab76323cdb6f9c43f9e97177aa920c5452029eef)

Curtis Hovey (sinzui) on 2014-11-12
Changed in fuel:
assignee: Registry Administrators (registry) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers