Test deploy_ceph_ha_nodegroups failed with: OSD node 2 is down

Bug #1567460 reported by Egor Kotko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Incomplete
High
Kyrylo Galanov
Mitaka
Incomplete
High
Kyrylo Galanov

Bug Description

Test deploy_ceph_ha_nodegroups failed with: OSD node 2 is down

Steps to reproduce:
            1. Revert snapshot with ready master node
            2. Create cluster with Neutron VXLAN, Ceph and custom nodegroup
            3. Exclude 10 first IPs from range for default admin/pxe network
            4. Bootstrap slave nodes from both default and custom nodegroups
            5. Check that excluded IPs aren't allocated to discovered nodes
            6. Add 3 controller + ceph nodes from default nodegroup
            7. Add 2 compute + ceph nodes from custom nodegroup
            8. Deploy cluster
            9. Run network verification
            10. Run health checks (OSTF)
            11. Check that excluded IPs aren't allocated to deployed nodes
            12. Check Ceph health

Test failed on the step 12 with error:

Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/usr/lib/python2.7/unittest/case.py", line 1043, in runTest
    self._testFunc()
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 296, in testng_method_mistake_capture_func
    compatability.capture_type_error(s_func)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/compatability/exceptions_2_6.py", line 27, in capture_type_error
    func()
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 350, in func
    func(test_case.state.get_state())
  File "/home/jenkins/workspace/9.0.system_test.ubuntu.thread_7/fuelweb_test/helpers/decorators.py", line 120, in wrapper
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/9.0.system_test.ubuntu.thread_7/fuelweb_test/tests/test_multiple_networks.py", line 498, in deploy_ceph_ha_nodegroups
    self.fuel_web.check_ceph_status(cluster_id)
  File "/home/jenkins/workspace/9.0.system_test.ubuntu.thread_7/fuelweb_test/__init__.py", line 59, in wrapped
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/9.0.system_test.ubuntu.thread_7/fuelweb_test/models/fuel_web_client.py", line 2144, in check_ceph_status
    ceph.check_disks(remote, [n['id'] for n in online_ceph_nodes])
  File "/home/jenkins/workspace/9.0.system_test.ubuntu.thread_7/fuelweb_test/helpers/ceph.py", line 135, in check_disks
    format(node['id']))
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/asserts.py", line 55, in assert_equal
    raise ASSERTION_ERROR(message)
AssertionError: OSD node 2 is down

osd.2 on node-4 was in down state:
root@node-4:~# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.29993 root default
-2 0.04999 host node-2
 0 0.04999 osd.0 up 1.00000 1.00000
-3 0.04999 host node-3
 1 0.04999 osd.1 up 1.00000 1.00000
-4 0.09998 host node-1
 3 0.04999 osd.3 up 1.00000 1.00000
 5 0.04999 osd.5 up 1.00000 1.00000
-5 0.09998 host node-5
 4 0.04999 osd.4 up 1.00000 1.00000
 6 0.04999 osd.6 up 1.00000 1.00000
 2 0 osd.2 down 0 1.00000

#vim /var/log/ceph/ceph-osd.2.log
2016-04-06 22:21:24.726022 7f12b39c4800 -1 created object store /var/lib/ceph/tmp/mnt.lLq18o journal /var/lib/ceph/tmp/mnt.lLq18o/journal for osd.2 fsid 0980364b-6376-4073-86d6-48f544a8b73c
2016-04-06 22:21:24.726366 7f12b39c4800 -1 auth: error reading file: /var/lib/ceph/tmp/mnt.lLq18o/keyring: can't open /var/lib/ceph/tmp/mnt.lLq18o/keyring: (2) No such file or directory
2016-04-06 22:21:24.726550 7f12b39c4800 -1 created new key in keyring /var/lib/ceph/tmp/mnt.lLq18o/keyring
but
root@node-4:~# ls /var/lib/ceph/tmp/mnt.lLq18o/keyring
ls: cannot access /var/lib/ceph/tmp/mnt.lLq18o/keyring: No such file or directory

ISO #fuel-9.0-168

Seems the bug is float it has already reproduced on previous tests.

Revision history for this message
Egor Kotko (ykotko) wrote :
Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

expected result

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
Changed in fuel:
status: New → Confirmed
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Kyrylo Galanov (kgalanov)
tags: added: ceph team-bugfix
Revision history for this message
Kyrylo Galanov (kgalanov) wrote :

MySQL related issue:

<154>Apr 6 22:14:57 node-4 cinder-api: 2016-04-06 22:14:57.148 30439 CRITICAL cinder [req-36546cf1-50e8-4282-8af8-4aa5b2e111c3 - - - - -] ProgrammingError: (_mysql_exceptions.ProgrammingError) (1146, "Table 'cinder.services' doesn't exist") [SQL: u'SELECT services.created_at AS services_created_at, services.updated_at AS services_updated_at, services.deleted_at AS services_deleted_at, services.deleted AS services_deleted, services.id AS services_id, services.host AS services_host, services.`binary` AS services_binary, services.topic AS services_topic, services.report_count AS services_report_count, services.disabled AS services_disabled, services.availability_zone AS services_availability_zone, services.disabled_reason AS services_disabled_reason, services.modified_at AS services_modified_at, services.rpc_current_version AS services_rpc_current_version, services.object_current_version AS services_object_current_version, services.replication_status AS services_replication_status, services.active_backend_id AS services_active_backend_id, services.frozen AS services_frozen \nFROM services \nWHERE services.deleted = false AND services.`binary` = %s'] [parameters: ('cinder-scheduler',)]
2016-04-06 22:14:57.148 30439 ERROR cinder Traceback (most recent call last):
2016-04-06 22:14:57.148 30439 ERROR cinder File "/usr/bin/cinder-api", line 10, in <module>
2016-04-06 22:14:57.148 30439 ERROR cinder sys.exit(main())
2016-04-06 22:14:57.148 30439 ERROR cinder File "/usr/lib/python2.7/dist-packages/cinder/cmd/api.py", line 61, in main
2016-04-06 22:14:57.148 30439 ERROR cinder server = service.WSGIService('osapi_volume')
2016-04-06 22:14:57.148 30439 ERROR cinder File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 370, in __init__
2016-04-06 22:14:57.148 30439 ERROR cinder self.app = self.loader.load_app(name)
2016-04-06 22:14:57.148 30439 ERROR cinder File "/usr/lib/python2.7/dist-packages/oslo_service/wsgi.py", line 353, in load_app
2016-04-06 22:14:57.148 30439 ERROR cinder return deploy.loa

Revision history for this message
Kyrylo Galanov (kgalanov) wrote :
Changed in fuel:
status: Confirmed → Incomplete
Changed in fuel:
milestone: 9.0 → 10.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.