Ceph_health is not OK, one of nodes is offline after revert-resume

Bug #1627020 reported by Alexey. Kalashnikov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Won't Fix
Medium
Fuel QA Team
Mitaka
Confirmed
Medium
Fuel QA Team

Bug Description

Swarm test "Suspend rabbit master, check neutron cluster, resume nodes, check cluster", failed with error message:
https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.ha_destructive_ceph_neutron/69/testReport/(root)/ha_ceph_neutron_rabbit_master_destroy/ha_ceph_neutron_rabbit_master_destroy/
Ceph HEALTH is not OK on slave-04_compute_ceph-osd. Details: []

After revert-resume env, node-3(slave-01_controller_ceph-osd) is just shut of
 13033 9.x.system_test.ubuntu.ha_destructive_ceph_neutron.69_admin running
 13034 9.x.system_test.ubuntu.ha_destructive_ceph_neutron.69_slave-02 running
 13035 9.x.system_test.ubuntu.ha_destructive_ceph_neutron.69_slave-03 running
 13036 9.x.system_test.ubuntu.ha_destructive_ceph_neutron.69_slave-04 running
 13037 9.x.system_test.ubuntu.ha_destructive_ceph_neutron.69_slave-05 running
 - 9.x.system_test.ubuntu.ha_destructive_ceph_neutron.69_slave-01 shut off

fuel nodes + ceph health:
http://paste.openstack.org/show/582745/

In code of the test, I saw that we should wait until node is back online, before checking ceph health status. It strange thing, the error appears only when we assert that status is not OK, but not warn about the suspended node is not became online in time.

If turn on node-3 and check ceph health:
root@node-1:~# ceph health
HEALTH_WARN too many PGs per OSD (341 > max 300)
--------------------------------
Scenario:
            1. Revert snapshot prepare_ha_ceph_neutron
            2. Wait galera is up, keystone re-trigger tokens
            3. Create instance, assign floating ip
            5. Ping instance by floating ip
            6. Suspend rabbit-master controller
            7. Run OSTF ha suite
            8. Ping created instance
            9. Suspend second rabbit-master controller
            10. Turn on controller from step 6
            11. Run OSTF ha suite
            12. Ping instance
            13. Turn on controller from step 9
            14. Run OSTF ha suite
            15. Ping instance
            16. Run OSTF

        Duration 40m
        """
-----------------------------

Diagnostic snapshot:
https://drive.google.com/open?id=0B0EB6QSDWt2vOHlnS01veWxNd00

Changed in fuel:
milestone: none → 9.2
status: New → Confirmed
Roman Vyalov (r0mikiam)
Changed in fuel:
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.