Fuel for OpenStack

Ceph_health is not OK, one of nodes is offline after revert-resume

Bug #1627020 reported by Alexey. Kalashnikov on 2016-09-23

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Won't Fix	Medium	Fuel QA Team	Fuel for OpenStack 9.2
	Mitaka	Confirmed	Medium	Fuel QA Team	Fuel for OpenStack 9.x-updates

Bug Description

Swarm test "Suspend rabbit master, check neutron cluster, resume nodes, check cluster", failed with error message:
https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.ha_destructive_ceph_neutron/69/testReport/(root)/ha_ceph_neutron_rabbit_master_destroy/ha_ceph_neutron_rabbit_master_destroy/
Ceph HEALTH is not OK on slave-04_compute_ceph-osd. Details: []

After revert-resume env, node-3(slave-01_controller_ceph-osd) is just shut of
13033 9.x.system_test.ubuntu.ha_destructive_ceph_neutron.69_admin running
13034 9.x.system_test.ubuntu.ha_destructive_ceph_neutron.69_slave-02 running
13035 9.x.system_test.ubuntu.ha_destructive_ceph_neutron.69_slave-03 running
13036 9.x.system_test.ubuntu.ha_destructive_ceph_neutron.69_slave-04 running
13037 9.x.system_test.ubuntu.ha_destructive_ceph_neutron.69_slave-05 running
- 9.x.system_test.ubuntu.ha_destructive_ceph_neutron.69_slave-01 shut off

fuel nodes + ceph health:
http://paste.openstack.org/show/582745/

In code of the test, I saw that we should wait until node is back online, before checking ceph health status. It strange thing, the error appears only when we assert that status is not OK, but not warn about the suspended node is not became online in time.

If turn on node-3 and check ceph health:
root@node-1:~# ceph health
HEALTH_WARN too many PGs per OSD (341 > max 300)
--------------------------------
Scenario:
            1. Revert snapshot prepare_ha_ceph_neutron
            2. Wait galera is up, keystone re-trigger tokens
            3. Create instance, assign floating ip
            5. Ping instance by floating ip
            6. Suspend rabbit-master controller
            7. Run OSTF ha suite
            8. Ping created instance
            9. Suspend second rabbit-master controller
            10. Turn on controller from step 6
            11. Run OSTF ha suite
            12. Ping instance
            13. Turn on controller from step 9
            14. Run OSTF ha suite
            15. Ping instance
            16. Run OSTF

Duration 40m
"""
-----------------------------

Diagnostic snapshot:
https://drive.google.com/open?id=0B0EB6QSDWt2vOHlnS01veWxNd00

Stanislaw Bogatkin (sbogatkin) on 2016-10-21

Changed in fuel:
milestone:	none → 9.2
status:	New → Confirmed

Roman Vyalov (r0mikiam) on 2017-02-03

Changed in fuel:
status:	Confirmed → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.