Ceph observed to briefly report health error during regression
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Ovidiu Poncea |
Bug Description
Brief Description
-----------------
Ceph is briefly reporting health errors during test execution
Severity
--------
Major
Steps to Reproduce
------------------
1. In test_storgroup_
[2018-10-28 03:39:14,302] 262 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
[2018-10-28 03:39:15,970] 382 DEBUG MainThread ssh.expect :: Output:
+----+-
| id | hostname | personality | administrative | operational | availability |
+----+-
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | controller-1 | controller | unlocked | enabled | available |
| 3 | storage-0 | storage | unlocked | enabled | available |
| 4 | storage-1 | storage | unlocked | enabled | available |
| 5 | compute-0 | compute | unlocked | enabled | available |
| 6 | compute-1 | compute | unlocked | enabled | available |
| 7 | compute-2 | compute | unlocked | enabled | available |
| 8 | compute-3 | compute | unlocked | enabled | available |
+----+-
2. Then the health of the ceph cluster is checked (okay):
[2018-10-28 03:39:16,074] 194 INFO MainThread verify_
[2018-10-28 03:39:16,074] 419 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2018-10-28 03:39:16,075] 262 DEBUG MainThread ssh.send :: Send 'ceph -s'
[2018-10-28 03:39:16,362] 382 DEBUG MainThread ssh.expect :: Output:
cluster 71b975c7-
health HEALTH_OK
monmap e8: 2 mons at {controller-
osdmap e272: 4 osds: 4 up, 4 in
flags sortbitwise,
pgmap v9981: 2176 pgs, 13 pools, 1786 MB data, 1356 objects
3790 MB used, 2599 GB / 2602 GB avail
3. Then the health of the cluster is checked again shortly after and the following is seen:
[2018-10-28 03:39:16,466] 198 INFO MainThread verify_
[2018-10-28 03:39:16,466] 419 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2018-10-28 03:39:16,466] 262 DEBUG MainThread ssh.send :: Send 'ceph -s'
[2018-10-28 03:39:16,747] 382 DEBUG MainThread ssh.expect :: Output:
cluster 71b975c7-
health HEALTH_ERR
no osds
monmap e1: 1 mons at {controller-
osdmap e1: 0 osds: 0 up, 0 in
flags sortbitwise,
pgmap v2: 64 pgs, 1 pools, 0 bytes data, 0 objects
0 kB used, 0 kB / 0 kB avail
[wrsroot@
[2018-10-28 03:39:16,747] 262 DEBUG MainThread ssh.send :: Send 'echo $?'
[2018-10-28 03:39:16,850] 382 DEBUG MainThread ssh.expect :: Output:
0
[wrsroot@
[2018-10-28 03:39:16,850] 72 INFO MainThread storage_
[2018-10-28 03:39:16,851] 200 INFO MainThread verify_
[2018-10-28 03:39:16,866] 53 DEBUG MainThread conftest.
4. Shortly later alarms are checked (only NTP alarm is present):
[2018-10-28 03:39:23,095] 262 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
[2018-10-28 03:39:24,455] 382 DEBUG MainThread ssh.expect :: Output:
+------
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+------
| 70a0a511-
+------
[wrsroot@
It looks like a short term blip in the ceph state.
Looking at test_swift_
[2018-10-28 03:40:14,675] 382 DEBUG MainThread ssh.expect :: Output:
cluster 71b975c7-
health HEALTH_ERR
no osds
monmap e1: 1 mons at {controller-
osdmap e1: 0 osds: 0 up, 0 in
flags sortbitwise,
pgmap v2: 64 pgs, 1 pools, 0 bytes data, 0 objects
0 kB used, 0 kB / 0 kB avail
[wrsroot@
Expected Behavior
------------------
Ceph health error not seen
Actual Behavior
----------------
Ceph health error briefly seen and then clears
Reproducibility
---------------
Intermittent.
System Configuration
-------
Storage
Branch/Pull Time/Commit
-------
master as of 2018-10-26_11-56-15
Changed in starlingx: | |
status: | Triaged → In Progress |
tags: |
added: stx.2019.05 removed: stx.2019.03 |
tags: |
added: stx.2.0 removed: stx.2019.05 |
Targeting stx.2019.03. Issue related to the ceph on all-in-one feature