TASK [ceph : Get OSD stat percentage]: null and null cannot be divided

Bug #1882387 reported by John Fulton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
John Fulton

Bug Description

Deployment with internal Ceph fails with the following message:

TASK [ceph : Get OSD stat percentage] ******************************************************************
Friday 05 June 2020 20:09:42 +0000 (0:00:00.298) 0:33:33.740 ***********
fatal: [undercloud -> 192.168.24.14]: FAILED! => {"ansible_facts": {"discovered_interpreter_python": "/u
sr/libexec/platform-python"}, "changed": true, "cmd": "\"podman\" exec \"ceph-mon-oc0-controller-0\" cep
h --cluster \"ceph\" osd stat -f json | jq '( (.num_in_osds) / (.num_osds) ) * 100'", "delta": "0:00:00.
664333", "end": "2020-06-05 20:09:43.389273", "msg": "non-zero return code", "rc": 5, "start": "2020-06-
05 20:09:42.724940", "stderr": "jq: error (at <stdin>:1): null (null) and null (null) cannot be divided"
, "stderr_lines": ["jq: error (at <stdin>:1): null (null) and null (null) cannot be divided"], "stdout":
 "", "stdout_lines": []}

Revision history for this message
John Fulton (jfulton-org) wrote :

ceph health is actually fine

[root@oc0-controller-0 ~]# podman exec ceph-mon-$HOSTNAME ceph -s
  cluster:
    id: a7c1c1e4-5cd6-4f1c-8bc2-a37140ee09a8
    health: HEALTH_WARN
            too few PGs per OSD (8 < min 30)

  services:
    mon: 3 daemons, quorum oc0-controller-2,oc0-controller-0,oc0-controller-1 (age 22h)
    mgr: oc0-controller-2(active, since 22h), standbys: oc0-controller-0, oc0-controller-1
    osd: 12 osds: 12 up (since 22h), 12 in (since 22h)

  data:
    pools: 3 pools, 96 pgs
    objects: 0 objects, 0 B
    usage: 12 GiB used, 588 GiB / 600 GiB avail
    pgs: 96 active+clean

[root@oc0-controller-0 ~]#

Revision history for this message
John Fulton (jfulton-org) wrote :

The num_in_osds and num_osds are set to integers in the JSON [1] and jq-1.6 is in the ceph container [2]

The syntax in the jq_osd_percentage_filter variable [3] is not correctly retrieving the values.

[1]
[root@oc0-controller-0 ~]# podman exec ceph-mon-oc0-controller-0 ceph --cluster ceph osd stat -f json | jq .
{
  "osdmap": {
    "epoch": 37,
    "num_osds": 12,
    "num_up_osds": 12,
    "num_in_osds": 12,
    "num_remapped_pgs": 0
  }
}
[root@oc0-controller-0 ~]#

[2]
# podman exec ceph-mon-oc0-controller-0 jq --version
jq-1.6

[3]
https://github.com/openstack/tripleo-validations/blob/master/roles/ceph/tasks/ceph-health.yaml#L68

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-validations (master)

Fix proposed to branch: master
Review: https://review.opendev.org/733971

Changed in tripleo:
status: Triaged → In Progress
tags: added: ussuri-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-validations (master)

Reviewed: https://review.opendev.org/733971
Committed: https://git.openstack.org/cgit/openstack/tripleo-validations/commit/?id=aa330a8e99e7280c87ee8b1988b7951f6c5addb0
Submitter: Zuul
Branch: master

commit aa330a8e99e7280c87ee8b1988b7951f6c5addb0
Author: John Fulton <email address hidden>
Date: Sat Jun 6 19:57:58 2020 +0000

    Update Ceph role's Get OSD stat to use new data structure

    The variables for 'num_osds' and similar are now enclosed
    in an additional map whose key is 'osdmap' so we need to
    use that key when extracting them.

    Change-Id: I40bed1fdc9cd39295a1b9f9aaed21d3814d7e2b5
    Closes-Bug: #1882387

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-validations (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/734068

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-validations (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/734069

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-validations (stable/ussuri)

Reviewed: https://review.opendev.org/734068
Committed: https://git.openstack.org/cgit/openstack/tripleo-validations/commit/?id=0bc2cad4eeb3fb25e614644eec1e075f5eee8049
Submitter: Zuul
Branch: stable/ussuri

commit 0bc2cad4eeb3fb25e614644eec1e075f5eee8049
Author: John Fulton <email address hidden>
Date: Sat Jun 6 19:57:58 2020 +0000

    Update Ceph role's Get OSD stat to use new data structure

    The variables for 'num_osds' and similar are now enclosed
    in an additional map whose key is 'osdmap' so we need to
    use that key when extracting them.

    Change-Id: I40bed1fdc9cd39295a1b9f9aaed21d3814d7e2b5
    Closes-Bug: #1882387
    (cherry picked from commit aa330a8e99e7280c87ee8b1988b7951f6c5addb0)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-validations (stable/train)

Reviewed: https://review.opendev.org/734069
Committed: https://git.openstack.org/cgit/openstack/tripleo-validations/commit/?id=08f469d7e85c16aa28e8a3f43450089778ea3d00
Submitter: Zuul
Branch: stable/train

commit 08f469d7e85c16aa28e8a3f43450089778ea3d00
Author: John Fulton <email address hidden>
Date: Sat Jun 6 19:57:58 2020 +0000

    Update Ceph role's Get OSD stat to use new data structure

    The variables for 'num_osds' and similar are now enclosed
    in an additional map whose key is 'osdmap' so we need to
    use that key when extracting them.

    Change-Id: I40bed1fdc9cd39295a1b9f9aaed21d3814d7e2b5
    Closes-Bug: #1882387
    (cherry picked from commit aa330a8e99e7280c87ee8b1988b7951f6c5addb0)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-validations (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/738855

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-validations (master)

Reviewed: https://review.opendev.org/738855
Committed: https://git.openstack.org/cgit/openstack/tripleo-validations/commit/?id=35821c44dfe448ae6a381ee873e3945ab899e431
Submitter: Zuul
Branch: master

commit 35821c44dfe448ae6a381ee873e3945ab899e431
Author: Giulio Fidente <email address hidden>
Date: Wed Jul 1 16:18:16 2020 +0200

    Make Get OSD stat percentage compatible with both Luminous and Nautilus

    On upgrade we might be using the newer validations against a Luminous
    cluster, hence the check needs to work in both cases.

    Change-Id: I16e7eb081640332ac69ee0b32d3d871fab40d447
    Related-Bug: 1882387

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-validations (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/741426

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-validations (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/741427

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-validations (stable/train)

Reviewed: https://review.opendev.org/741427
Committed: https://git.openstack.org/cgit/openstack/tripleo-validations/commit/?id=02664d11456311f7c80a43eb9f8a9b8f01bb307d
Submitter: Zuul
Branch: stable/train

commit 02664d11456311f7c80a43eb9f8a9b8f01bb307d
Author: Giulio Fidente <email address hidden>
Date: Wed Jul 1 16:18:16 2020 +0200

    Make Get OSD stat percentage compatible with both Luminous and Nautilus

    On upgrade we might be using the newer validations against a Luminous
    cluster, hence the check needs to work in both cases.

    Change-Id: I16e7eb081640332ac69ee0b32d3d871fab40d447
    Related-Bug: 1882387
    (cherry picked from commit 35821c44dfe448ae6a381ee873e3945ab899e431)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-validations (stable/ussuri)

Reviewed: https://review.opendev.org/741426
Committed: https://git.openstack.org/cgit/openstack/tripleo-validations/commit/?id=f97940e6ad9dd41327b1f6d33b39a9c93505365d
Submitter: Zuul
Branch: stable/ussuri

commit f97940e6ad9dd41327b1f6d33b39a9c93505365d
Author: Giulio Fidente <email address hidden>
Date: Wed Jul 1 16:18:16 2020 +0200

    Make Get OSD stat percentage compatible with both Luminous and Nautilus

    On upgrade we might be using the newer validations against a Luminous
    cluster, hence the check needs to work in both cases.

    Depends-On: Id20ee6bf069f53739f7a5ea7983edf2fc9b445d2
    Change-Id: I16e7eb081640332ac69ee0b32d3d871fab40d447
    Related-Bug: 1882387
    (cherry picked from commit 35821c44dfe448ae6a381ee873e3945ab899e431)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-validations 11.4.0

This issue was fixed in the openstack/tripleo-validations 11.4.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.