tripleo

TASK [ceph : Get OSD stat percentage]: null and null cannot be divided

Bug #1882387 reported by John Fulton on 2020-06-06

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tripleo	Fix Released	High	John Fulton	tripleo victoria-1 "tripleo victoria"

Bug Description

Deployment with internal Ceph fails with the following message:

TASK [ceph : Get OSD stat percentage] ******************************************************************
Friday 05 June 2020 20:09:42 +0000 (0:00:00.298) 0:33:33.740 ***********
fatal: [undercloud -> 192.168.24.14]: FAILED! => {"ansible_facts": {"discovered_interpreter_python": "/u
sr/libexec/platform-python"}, "changed": true, "cmd": "\"podman\" exec \"ceph-mon-oc0-controller-0\" cep
h --cluster \"ceph\" osd stat -f json | jq '( (.num_in_osds) / (.num_osds) ) * 100'", "delta": "0:00:00.
664333", "end": "2020-06-05 20:09:43.389273", "msg": "non-zero return code", "rc": 5, "start": "2020-06-
05 20:09:42.724940", "stderr": "jq: error (at <stdin>:1): null (null) and null (null) cannot be divided"
, "stderr_lines": ["jq: error (at <stdin>:1): null (null) and null (null) cannot be divided"], "stdout":
"", "stdout_lines": []}

Tags:

Revision history for this message

John Fulton (jfulton-org) wrote on 2020-06-06:

ceph health is actually fine

[root@oc0-controller-0 ~]# podman exec ceph-mon-$HOSTNAME ceph -s
  cluster:
    id: a7c1c1e4-5cd6-4f1c-8bc2-a37140ee09a8
    health: HEALTH_WARN
            too few PGs per OSD (8 < min 30)

  services:
    mon: 3 daemons, quorum oc0-controller-2,oc0-controller-0,oc0-controller-1 (age 22h)
    mgr: oc0-controller-2(active, since 22h), standbys: oc0-controller-0, oc0-controller-1
    osd: 12 osds: 12 up (since 22h), 12 in (since 22h)

  data:
    pools: 3 pools, 96 pgs
    objects: 0 objects, 0 B
    usage: 12 GiB used, 588 GiB / 600 GiB avail
    pgs: 96 active+clean

[root@oc0-controller-0 ~]#

Revision history for this message

John Fulton (jfulton-org) wrote on 2020-06-06:

The num_in_osds and num_osds are set to integers in the JSON [1] and jq-1.6 is in the ceph container [2]

The syntax in the jq_osd_percentage_filter variable [3] is not correctly retrieving the values.

[1]
[root@oc0-controller-0 ~]# podman exec ceph-mon-oc0-controller-0 ceph --cluster ceph osd stat -f json | jq .
{
  "osdmap": {
    "epoch": 37,
    "num_osds": 12,
    "num_up_osds": 12,
    "num_in_osds": 12,
    "num_remapped_pgs": 0
  }
}
[root@oc0-controller-0 ~]#

[2]
# podman exec ceph-mon-oc0-controller-0 jq --version
jq-1.6

[3]
https://github.com/openstack/tripleo-validations/blob/master/roles/ceph/tasks/ceph-health.yaml#L68

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-06-06: Fix proposed to tripleo-validations (master)

Fix proposed to branch: master
Review: https://review.opendev.org/733971

Changed in tripleo:
status:	Triaged → In Progress

John Fulton (jfulton-org) on 2020-06-06

tags:

added: ussuri-backport-potential

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-06-08: Fix merged to tripleo-validations (master)

Reviewed: https://review.opendev.org/733971
Committed: https://git.openstack.org/cgit/openstack/tripleo-validations/commit/?id=aa330a8e99e7280c87ee8b1988b7951f6c5addb0
Submitter: Zuul
Branch: master

commit aa330a8e99e7280c87ee8b1988b7951f6c5addb0
Author: John Fulton <email address hidden>
Date: Sat Jun 6 19:57:58 2020 +0000

Update Ceph role's Get OSD stat to use new data structure

    The variables for 'num_osds' and similar are now enclosed
    in an additional map whose key is 'osdmap' so we need to
    use that key when extracting them.

Change-Id: I40bed1fdc9cd39295a1b9f9aaed21d3814d7e2b5
Closes-Bug: #1882387

Changed in tripleo:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-06-08: Fix proposed to tripleo-validations (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/734068

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-06-08: Fix proposed to tripleo-validations (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/734069

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-06-09: Fix merged to tripleo-validations (stable/ussuri)

Reviewed: https://review.opendev.org/734068
Committed: https://git.openstack.org/cgit/openstack/tripleo-validations/commit/?id=0bc2cad4eeb3fb25e614644eec1e075f5eee8049
Submitter: Zuul
Branch: stable/ussuri

commit 0bc2cad4eeb3fb25e614644eec1e075f5eee8049
Author: John Fulton <email address hidden>
Date: Sat Jun 6 19:57:58 2020 +0000

Update Ceph role's Get OSD stat to use new data structure

    The variables for 'num_osds' and similar are now enclosed
    in an additional map whose key is 'osdmap' so we need to
    use that key when extracting them.

    Change-Id: I40bed1fdc9cd39295a1b9f9aaed21d3814d7e2b5
    Closes-Bug: #1882387
    (cherry picked from commit aa330a8e99e7280c87ee8b1988b7951f6c5addb0)

tags:

added: in-stable-ussuri

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-06-11: Fix merged to tripleo-validations (stable/train)

Reviewed: https://review.opendev.org/734069
Committed: https://git.openstack.org/cgit/openstack/tripleo-validations/commit/?id=08f469d7e85c16aa28e8a3f43450089778ea3d00
Submitter: Zuul
Branch: stable/train

commit 08f469d7e85c16aa28e8a3f43450089778ea3d00
Author: John Fulton <email address hidden>
Date: Sat Jun 6 19:57:58 2020 +0000

Update Ceph role's Get OSD stat to use new data structure

    The variables for 'num_osds' and similar are now enclosed
    in an additional map whose key is 'osdmap' so we need to
    use that key when extracting them.

    Change-Id: I40bed1fdc9cd39295a1b9f9aaed21d3814d7e2b5
    Closes-Bug: #1882387
    (cherry picked from commit aa330a8e99e7280c87ee8b1988b7951f6c5addb0)

tags:

added: in-stable-train

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-07-01: Related fix proposed to tripleo-validations (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/738855

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-07-13: Related fix merged to tripleo-validations (master)

#10

Reviewed: https://review.opendev.org/738855
Committed: https://git.openstack.org/cgit/openstack/tripleo-validations/commit/?id=35821c44dfe448ae6a381ee873e3945ab899e431
Submitter: Zuul
Branch: master

commit 35821c44dfe448ae6a381ee873e3945ab899e431
Author: Giulio Fidente <email address hidden>
Date: Wed Jul 1 16:18:16 2020 +0200

Make Get OSD stat percentage compatible with both Luminous and Nautilus

On upgrade we might be using the newer validations against a Luminous
cluster, hence the check needs to work in both cases.

Change-Id: I16e7eb081640332ac69ee0b32d3d871fab40d447
Related-Bug: 1882387

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-07-16: Related fix proposed to tripleo-validations (stable/ussuri)

#11

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/741426

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-07-16: Related fix proposed to tripleo-validations (stable/train)

#12

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/741427

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-07-18: Related fix merged to tripleo-validations (stable/train)

#13

Reviewed: https://review.opendev.org/741427
Committed: https://git.openstack.org/cgit/openstack/tripleo-validations/commit/?id=02664d11456311f7c80a43eb9f8a9b8f01bb307d
Submitter: Zuul
Branch: stable/train

commit 02664d11456311f7c80a43eb9f8a9b8f01bb307d
Author: Giulio Fidente <email address hidden>
Date: Wed Jul 1 16:18:16 2020 +0200

Make Get OSD stat percentage compatible with both Luminous and Nautilus

On upgrade we might be using the newer validations against a Luminous
cluster, hence the check needs to work in both cases.

    Change-Id: I16e7eb081640332ac69ee0b32d3d871fab40d447
    Related-Bug: 1882387
    (cherry picked from commit 35821c44dfe448ae6a381ee873e3945ab899e431)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-07-31: Related fix merged to tripleo-validations (stable/ussuri)

#14

Reviewed: https://review.opendev.org/741426
Committed: https://git.openstack.org/cgit/openstack/tripleo-validations/commit/?id=f97940e6ad9dd41327b1f6d33b39a9c93505365d
Submitter: Zuul
Branch: stable/ussuri

commit f97940e6ad9dd41327b1f6d33b39a9c93505365d
Author: Giulio Fidente <email address hidden>
Date: Wed Jul 1 16:18:16 2020 +0200

Make Get OSD stat percentage compatible with both Luminous and Nautilus

On upgrade we might be using the newer validations against a Luminous
cluster, hence the check needs to work in both cases.

    Depends-On: Id20ee6bf069f53739f7a5ea7983edf2fc9b445d2
    Change-Id: I16e7eb081640332ac69ee0b32d3d871fab40d447
    Related-Bug: 1882387
    (cherry picked from commit 35821c44dfe448ae6a381ee873e3945ab899e431)