Actions do not run correctly on new OSDs

Bug #1948451 reported by Dan Ardelean
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
New
Undecided
Unassigned

Bug Description

Hi,

I have a Ceph Cluster with 3 ceph-mons (rev 58) and 3 ceph-osd (rev 312).
In the initial cluster I had one osd-device per OSD, this is a testing env so I used folder based OSDs:

$ juju config ceph-osd osd-devices
/srv/osd

I've added one more osd-device per OSD and everything was fine - used crush-initial-weight=0:

$ juju config ceph-osd osd-devices="/srv/osd /srv/osd2"

$ juju ssh ceph-mon/0 sudo ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.42339 root default
-3 0.14169 host juju-a21053-4
 1 hdd 0.14169 osd.1 up 1.00000 1.00000
 5 hdd 0 osd.5 up 1.00000 1.00000
-5 0.14090 host juju-a21053-5
 0 hdd 0.14090 osd.0 up 1.00000 1.00000
 4 hdd 0 osd.4 up 1.00000 1.00000
-7 0.14079 host juju-a21053-6
 2 hdd 0.14079 osd.2 up 1.00000 1.00000
 3 hdd 0 osd.3 up 1.00000 1.00000

Now I wanted to take the new osd-devices, 3 4 5, out, it only worked for the one with ID '5', for the rest, juju failed to recognize them:

$ juju run-action ceph-osd/0 --wait osd-out osds=5
unit-ceph-osd-0:
  UnitId: ceph-osd/0
  id: "66"
  results:
    message: osd-out action was successfully executed for ceph OSD devices [5]
    outputs: "marked out osd.5. \n"
  status: completed
  timing:
    completed: 2021-10-22 11:16:20 +0000 UTC
    enqueued: 2021-10-22 11:16:14 +0000 UTC
    started: 2021-10-22 11:16:18 +0000 UTC

$ juju run-action ceph-osd/0 --wait osd-out osds=3
unit-ceph-osd-0:
  UnitId: ceph-osd/0
  id: "68"
  message: 'invalid ceph OSD device id: 3'
  results: {}
  status: failed
  timing:
    completed: 2021-10-22 11:16:23 +0000 UTC
    enqueued: 2021-10-22 11:16:22 +0000 UTC
    started: 2021-10-22 11:16:22 +0000 UTC

$ juju run-action ceph-osd/0 --wait osd-out osds=4
unit-ceph-osd-0:
  UnitId: ceph-osd/0
  id: "70"
  message: 'invalid ceph OSD device id: 4'
  results: {}
  status: failed
  timing:
    completed: 2021-10-22 11:16:28 +0000 UTC
    enqueued: 2021-10-22 11:16:26 +0000 UTC
    started: 2021-10-22 11:16:27 +0000 UTC

However, I can see the OSDs with "ceph osd find":

$ juju ssh ceph-mon/0 sudo ceph osd find osd.3
{
    "osd": 3,
    "ip": "10.0.8.154:6804/152993",
    "osd_fsid": "9cbe1c64-0f4c-4f86-b799-4a6707a06f7f",
    "crush_location": {
        "host": "juju-a21053-6",
        "root": "default"
    }
}

$ juju ssh ceph-mon/0 sudo ceph osd find osd.5
{
    "osd": 5,
    "ip": "10.0.8.155:6804/152759",
    "osd_fsid": "3741ec57-3e68-457a-a574-76c07b45d55a",
    "crush_location": {
        "host": "juju-a21053-4",
        "root": "default"
    }
}

$ juju ssh ceph-mon/0 sudo ceph osd find 4
{
    "osd": 4,
    "ip": "10.0.8.156:6804/153332",
    "osd_fsid": "fdd8546d-3f1b-4b32-a7d4-803fee5b5210",
    "crush_location": {
        "host": "juju-a21053-5",
        "root": "default"
    }
}

Using "osd-out osds=osd.3" has the same result, no osd found.

Other actions have the same result:

$ juju run-action ceph-osd/0 --wait stop osds=4
unit-ceph-osd-0:
  UnitId: ceph-osd/0
  id: "100"
  message: 'Action ''stop'' failed: Some services are not present on this unit: [''ceph-osd@4.service'']'
  results: {}
  status: failed
  timing:
    completed: 2021-10-22 11:29:33 +0000 UTC
    enqueued: 2021-10-22 11:29:31 +0000 UTC
    started: 2021-10-22 11:29:33 +0000 UTC

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.