OSDs all move from Unit Is Ready to Non-pristine devices detected after a few minutes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph OSD Charm |
Fix Released
|
High
|
Dmitrii Shcherbakov |
Bug Description
We have a ceph cluster containing 15 nodes each with 10 nvme devices using vault and encryption at rest.
The charm deploys correctly, and when we unlock vault they as show as formating and tuning the storage, eventually reaching active with status "Unit is Ready (10 OSDs)", then over the course of about 10 minutes they all switch to blocked wth a status of "Non-pristine devices detected, consult `list-disks`, `zap-disk` and `blacklist-*` actions."
Running list-disks action on any of the units gives:
results:
blacklist: '[]'
disks: '[''/dev/nvme4n1'', ''/dev/nvme8n1'', ''/dev/nvme6n1'', ''/dev/nvme5n1'',
''/
''/
non-pristine: '[''/dev/nvme4n1'', ''/dev/nvme8n1'', ''/dev/nvme6n1'', ''/dev/nvme5n1'',
''/
''/
There are no python tracebacks in the logs
Changed in charm-ceph-osd: | |
status: | Incomplete → New |
Changed in charm-ceph-osd: | |
status: | Fix Committed → Fix Released |
Please add juju status, the sanitized bundle used to deploy this.
Also, a juju crashdump is ideal for analysis, though it may contain private information.