validations ceph-health incorrectly reports 0 OSDs

Bug #1838556 reported by John Fulton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Giulio Fidente

Bug Description

The Get ceph health task:

https://opendev.org/openstack/tripleo-validations/src/branch/master/roles/ceph/tasks/ceph-health.yaml#L32

executes a shell command which can fail and if it fails it reports there are 0 OSDs, even if there are >0 OSDs.

2019-07-30 21:15:24,037 p=300 u=mistral | TASK [ceph : Get OSD stat percentage] ******************************************
2019-07-30 21:15:24,037 p=300 u=mistral | task path: /usr/share/openstack-tripleo-validations/roles/ceph/tasks/ceph-health.yaml:60
2019-07-30 21:15:24,225 p=300 u=mistral | Tuesday 30 July 2019 21:15:24 +0000 (0:00:00.488) 0:28:24.400 **********
2019-07-30 21:15:24,809 p=300 u=mistral | changed: [undercloud -> 192.168.24.3] => {
    "changed": true,
    "cmd": "docker exec ceph-mon-centos-7-ovh-bhs1-0009494611 ceph osd stat -f json | jq '( (.num_in_osds) / (.num_osds) ) * 100'",
    "delta": "0:00:00.268931",
    "end": "2019-07-30 21:15:24.778989",
    "rc": 0,
    "start": "2019-07-30 21:15:24.510058"
}

STDERR:

2019-07-30 21:15:24.743 7ff70647f700 -1 Errors while parsing config file!
2019-07-30 21:15:24.743 7ff70647f700 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2019-07-30 21:15:24.743 7ff70647f700 -1 parse_file: cannot open /root/.ceph/ceph.conf: (2) No such file or directory
2019-07-30 21:15:24.743 7ff70647f700 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)

2019-07-30 21:15:24,996 p=300 u=mistral | TASK [ceph : Fail if there is an unacceptable percentage of in OSDs] ***********
2019-07-30 21:15:24,997 p=300 u=mistral | task path: /usr/share/openstack-tripleo-validations/roles/ceph/tasks/ceph-health.yaml:65
2019-07-30 21:15:25,028 p=300 u=mistral | Tuesday 30 July 2019 21:15:25 +0000 (0:00:00.803) 0:28:25.204 **********
2019-07-30 21:15:25,168 p=300 u=mistral | fatal: [undercloud -> 192.168.24.3]: FAILED! => {
    "changed": false
}

MSG:

Only 0.0% of OSDs are in, but 66% are required

Changed in tripleo:
importance: Undecided → High
Revision history for this message
Rabi Mishra (rabi) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-validations (master)

Fix proposed to branch: master
Review: https://review.opendev.org/674778

Changed in tripleo:
assignee: John Fulton (jfulton-org) → Giulio Fidente (gfidente)
status: Triaged → In Progress
Revision history for this message
John Fulton (jfulton-org) wrote :

This bug report deserves some history to help avoid confusion:

- multinode010 had the issue reported in this bug because the health check can't handle custom ceph cluster names

- PS3 of https://review.opendev.org/#/c/673545 succeeded on multinode010 if it doesn't use a custom ceph cluster name

- However, we want multinode010 to continue testing custom ceph cluster names so 673545 depended on https://review.opendev.org/#/c/674217 in ps4

- https://review.opendev.org/#/c/674217 then merged (but shouldn't have) and introduced the symptoms in this bug into voting scenarios 001/004

- 674217 was then reverted in https://review.opendev.org/#/c/674771 and voting scenarios 001/004 are green again

- We now just have non-voting multinode010 broken and its fix, 673545, will need to depend on an update to validations which supports custom cluster names but doesn't introduce problems the way that 674217 did.

Next things to do:

1. patch to tripleo-validations which Improve Ceph health checks to work with custom cluster names
2. revise patch 673545 to depends on 1

Revision history for this message
John Fulton (jfulton-org) wrote :

Regarding next things to do:

> 1. patch to tripleo-validations which Improve Ceph health checks to work with custom cluster names

 https://review.opendev.org/#/c/674778/

> 2. revise patch 673545 to depends on 1

Done.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-validations (master)

Reviewed: https://review.opendev.org/674778
Committed: https://git.openstack.org/cgit/openstack/tripleo-validations/commit/?id=dd1c4dd16a7621ca93db5f74be4bdcaa8d78ad8f
Submitter: Zuul
Branch: master

commit dd1c4dd16a7621ca93db5f74be4bdcaa8d78ad8f
Author: Giulio Fidente <email address hidden>
Date: Tue Aug 6 12:59:34 2019 +0200

    Use custom Ceph cluster name in validations

    Also adds scenario004 as check and gate job for changes to the
    Ceph roles.

    Change-Id: Ic895c81f11b61e7310b7ca17fad4693c4eea0418
    Closes-Bug: 1838556

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-validations (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/675586

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-validations (stable/stein)

Reviewed: https://review.opendev.org/675586
Committed: https://git.openstack.org/cgit/openstack/tripleo-validations/commit/?id=7b259086e07402ccd997427fbb98884911b2ea84
Submitter: Zuul
Branch: stable/stein

commit 7b259086e07402ccd997427fbb98884911b2ea84
Author: Giulio Fidente <email address hidden>
Date: Tue Aug 6 12:59:34 2019 +0200

    Use custom Ceph cluster name in validations

    Also adds scenario004 as check and gate job for changes to the
    Ceph roles.

    Change-Id: Ic895c81f11b61e7310b7ca17fad4693c4eea0418
    Closes-Bug: 1838556
    (cherry picked from commit dd1c4dd16a7621ca93db5f74be4bdcaa8d78ad8f)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-validations 10.5.1

This issue was fixed in the openstack/tripleo-validations 10.5.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-validations 11.2.0

This issue was fixed in the openstack/tripleo-validations 11.2.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.