Ceph_health is not OK, one of nodes is offline after revert-resume
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Won't Fix
|
Medium
|
Fuel QA Team | ||
Mitaka |
Confirmed
|
Medium
|
Fuel QA Team |
Bug Description
Swarm test "Suspend rabbit master, check neutron cluster, resume nodes, check cluster", failed with error message:
https:/
Ceph HEALTH is not OK on slave-04_
After revert-resume env, node-3(
13033 9.x.system_
13034 9.x.system_
13035 9.x.system_
13036 9.x.system_
13037 9.x.system_
- 9.x.system_
fuel nodes + ceph health:
http://
In code of the test, I saw that we should wait until node is back online, before checking ceph health status. It strange thing, the error appears only when we assert that status is not OK, but not warn about the suspended node is not became online in time.
If turn on node-3 and check ceph health:
root@node-1:~# ceph health
HEALTH_WARN too many PGs per OSD (341 > max 300)
-------
Scenario:
1. Revert snapshot prepare_
2. Wait galera is up, keystone re-trigger tokens
3. Create instance, assign floating ip
5. Ping instance by floating ip
6. Suspend rabbit-master controller
7. Run OSTF ha suite
8. Ping created instance
9. Suspend second rabbit-master controller
10. Turn on controller from step 6
11. Run OSTF ha suite
12. Ping instance
13. Turn on controller from step 9
14. Run OSTF ha suite
15. Ping instance
16. Run OSTF
Duration 40m
"""
-------
Diagnostic snapshot:
https:/
Changed in fuel: | |
milestone: | none → 9.2 |
status: | New → Confirmed |
Changed in fuel: | |
status: | Confirmed → Won't Fix |