[bionic][ussuri] pool mirroring fails due to HEALTH warning

Bug #1892201 reported by Alex Kavanagh
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ceph RBD Mirror Charm
Triaged
Critical
Unassigned

Bug Description

Disabling the test as currently can't resolve the health warning.

Within ceph:

# ceph version
ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)
root@juju-c6792d-zaza-0ae323d9655e-0:/home/ubuntu# ceph versions
{
    "mon": {
        "ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)": 3
    },
    "mgr": {
        "ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)": 3
    },
    "osd": {
        "ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)": 3
    },
    "mds": {},
    "rbd-mirror": {
        "ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)": 1
    },
    "overall": {
        "ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)": 10
    }
}

On ceph-rbd-mirror/0:

# rbd --id rbd-mirror.juju-c6792d-zaza-0ae323d9655e-12 mirror pool status glance --verbose
health: WARNING
daemon health: OK
image health: WARNING
images: 1 total
    1 unknown

DAEMONS
service 5358:
  instance_id: 5812
  client_id: juju-c6792d-zaza-0ae323d9655e-12
  hostname: juju-c6792d-zaza-0ae323d9655e-12
  version: 15.2.1
  leader: true
  health: OK

IMAGES
3f5bd246-b43a-46fd-916b-63deaeee07ca:
  global_id: 445d9987-6e42-44bb-8dd9-6c42a79b388b
  state: up+stopped
  description: local image is primary
  service: juju-c6792d-zaza-0ae323d9655e-12 on juju-c6792d-zaza-0ae323d9655e-12
  last_update: 2020-08-19 13:41:59

# # rbd --cluster remote --id rbd-mirror.juju-c6792d-zaza-0ae323d9655e-12 mirror pool status glance --verbose
health: WARNING
daemon health: OK
image health: WARNING
images: 1 total
    1 unknown

DAEMONS
service 4977:
  instance_id: 5425
  client_id: juju-c6792d-zaza-0ae323d9655e-13
  hostname: juju-c6792d-zaza-0ae323d9655e-13
  version: 15.2.1
  leader: true
  health: OK

IMAGES
3f5bd246-b43a-46fd-916b-63deaeee07ca:
  global_id: 445d9987-6e42-44bb-8dd9-6c42a79b388b
  state: up+replaying
  description: replaying, master_position=[object_number=20, tag_tid=1, entry_tid=22020], mirror_position=[object_number=20, tag_tid=1, entry_tid=22020], entries_behind_master=0
  service: juju-c6792d-zaza-0ae323d9655e-13 on juju-c6792d-zaza-0ae323d9655e-13
  last_update: 2020-08-19 13:42:53

i.e. despite the image being replicated, up-to-date, and otherwise okay, the image health is still 'unknown' and hence a warning. This means that the status looks like:

ceph-rbd-mirror-b/0* blocked idle 13 172.20.0.17 Unit is ready (Pools WARNING (2) OK (1) Images unknown (2))
ceph-rbd-mirror/0* blocked idle 12 172.20.0.18 Unit is ready (Pools WARNING (2) OK (1) Images unknown (2))

And thus, the test fails.

Changed in charm-ceph-rbd-mirror:
importance: Undecided → Critical
milestone: none → 20.10
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to charm-ceph-rbd-mirror (master)

Reviewed: https://review.opendev.org/717068
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-rbd-mirror/commit/?id=5199c767137fe4fa2abcd619221facad40c6defe
Submitter: Zuul
Branch: master

commit 5199c767137fe4fa2abcd619221facad40c6defe
Author: Alex Kavanagh <email address hidden>
Date: Thu Apr 2 16:59:07 2020 +0100

    Add focal and ussuri bundles to the charm

    This patch updates the bundles to include up to focal-ussuri.
    The focal-ussuri bundle is in the dev bundles as it can't pass at the
    moment due to LP: #1865754.

    The bionic-ussuri bundle is in the dev bundles (i.e. not gate) as it
    fails due to LP: #1892201

    Also deal with the related bug where cinder-ceph requires the relation
    with a nova-compute unit.

    Related-Bug: #1881246
    Related-Bug: #1865754
    Related-Bug: #1892201
    Change-Id: I0a6f1de82ecc601509822277d657485e08dc893d

David Ames (thedac)
Changed in charm-ceph-rbd-mirror:
milestone: 20.10 → 21.01
David Ames (thedac)
Changed in charm-ceph-rbd-mirror:
milestone: 21.01 → none
Revision history for this message
David Ames (thedac) wrote :

Bug hygiene:

Triage is in the bug description:

1) Determine why Ussuri/octopus version pool status health is in warning state
2) resolve
3) re-enable the Ussuri/Octopus tests

Changed in charm-ceph-rbd-mirror:
status: New → Triaged
Revision history for this message
Peter Matulis (petermatulis) wrote :

I set up RBD mirroring in two models ('site-a' and 'site-b') and then deployed OpenStack to one of them ('site-a'). After importing an image to Glance `juju status` output informs me that the ceph-rbd-mirror unit in both models enters the 'blocked' state and issues a warning:

site-a-ceph-rbd-mirror/0* blocked idle 3 10.5.0.36 Unit is ready (Pools WARNING (1) Images unknown (1))

site-b-ceph-rbd-mirror/0* blocked idle 3 10.5.0.21 Unit is ready (Pools WARNING (1) Images unknown (1))

This is what I see for the 'glance' pool in both models:

$ juju ssh -m site-a site-a-ceph-mon/0 sudo rbd mirror pool status glance --verbose

health: WARNING
daemon health: OK
image health: WARNING
images: 1 total
    1 unknown

DAEMONS
service 4813:
  instance_id: 6267
  client_id: juju-ae4ac4-site-a-3
  hostname: juju-ae4ac4-site-a-3
  version: 15.2.8
  leader: true
  health: OK

IMAGES
29843766-46af-458c-a377-9fb415794e34:
  global_id: 3ce8603c-af5c-4bcd-91b8-1be0e827c294
  state: up+stopped
  description: local image is primary
  service: juju-ae4ac4-site-a-3 on juju-ae4ac4-site-a-3
  last_update: 2021-03-19 19:17:30

---------------------

$ juju ssh -m site-b site-b-ceph-mon/0 sudo rbd mirror pool status glance --verbose

health: WARNING
daemon health: OK
image health: WARNING
images: 1 total
    1 unknown

DAEMONS
service 4878:
  instance_id: 5446
  client_id: juju-1930d6-site-b-3
  hostname: juju-1930d6-site-b-3
  version: 15.2.8
  leader: true
  health: OK

IMAGES
29843766-46af-458c-a377-9fb415794e34:
  global_id: 3ce8603c-af5c-4bcd-91b8-1be0e827c294
  state: up+replaying
  description: replaying, {"bytes_per_second":0.0,"entries_behind_primary":0,"entries_per_second":0.0,"non_primary_position":{"entry_tid":33936,"object_number":32,"tag_tid":1},"primary_position":{"entry_tid":33936,"object_number":32,"tag_tid":1}}
  service: juju-1930d6-site-b-3 on juju-1930d6-site-b-3
  last_update: 2021-03-19 19:17:30

======

I've attached logs for the ceph-rbd-mirror unit in model 'site-a'.

Revision history for this message
Peter Matulis (petermatulis) wrote :

I neglected to mention that I'm running focal-victoria (and thus Ceph Octopus).

Revision history for this message
Peter Matulis (petermatulis) wrote :

I have now experienced similar symptoms as in comment #3 but when using a single model.

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

This is now also hitting us on zOSCI in this review: https://review.opendev.org/c/openstack/charm-ceph-rbd-mirror/+/761549

Revision history for this message
Billy Olsen (billy-olsen) wrote :

Marking this as a duplicate of bug #1879749 - it looks to be a keys issue and is being actively resolved/worked on there so marking this as a duplicate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.