Non-pristine devices detected, consult `list-disks`, `zap-disk` and `blacklist-*` actions.

Bug #1922110 reported by Marian Gasparovic
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
New
Undecided
Unassigned

Bug Description

ceph-osd rev 308, one osd reported non-pristine devices detected

Be aware you will find in the log I tried later several zap-disk actions and add-disk unsuccessfully after this stayed in blocked state for some time.

this is lsblk from working OSD

sdb 8:16 0 20G 0 disk
├─sdb1 8:17 0 1M 0 part
└─sdb2 8:18 0 18.6G 0 part
  ├─bcache0 252:0 0 20G 0 disk
  │ └─crypt-6cfafd34-fa00-4fef-af25-b65f984d20a1 253:1 0 20G 0 crypt
  │ └─ceph--6cfafd34--fa00--4fef--af25--b65f984d20a1-osd--block--6cfafd34--fa00--4fef--af25--b65f984d20a1 253:2 0 20G 0 lvm
  ├─bcache1 252:128 0 165.8G 0 disk
  │ └─crypt-d3ee73de-a4f8-4ab8-9e2e-26bbb28fb1ee 253:0 0 165.8G 0 crypt /var/lib/nova/instances
  └─bcache2

and this is from failing ceph-osd/0

sdb 8:16 0 20G 0 disk
├─sdb1 8:17 0 1M 0 part
└─sdb2 8:18 0 18.6G 0 part
  ├─bcache0 252:0 0 20G 0 disk
  │ └─crypt-9198e628-3e01-4b21-a2ff-3429044d1765 253:1 0 20G 0 crypt
  ├─bcache1 252:128 0 74.5G 0 disk /
  └─bcache2 252:256 0 165.8G 0 disk
    └─crypt-386fb0f2-43ae-474f-ad2f-ae069ff75ddf 253:0 0 165.8G 0 crypt /var/lib/nova/instances

This is the first time it happened in this test environment, those nodes are redeployed often

crashdump - https://oil-jenkins.canonical.com/artifacts/ff6fa09a-7b50-44ec-afe7-1c6d06fc94a6/generated/generated/openstack/juju-crashdump-openstack-2021-03-31-15.58.15.tar.gz

Tags: cdo-qa
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

This is a safety feature of the charm to stop it overwriting date if it is incorrectly configured. For it to be a bug, it's necessary to check the the block device configured for the charm is pristine; if it's not then the charm is working as advertised.

Is it possible to verify that a pristine device was configured for the charm and the charm incorrectly decided it wasn't pristine?

Changed in charm-ceph-osd:
status: New → Incomplete
Revision history for this message
Marian Gasparovic (marosg) wrote :

Those nodes are redeployed over and over again during our tests and we encountered this only once so far. Does it help?

Changed in charm-ceph-osd:
status: Incomplete → New
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

> Those nodes are redeployed over and over again during our tests and we encountered this only once so far. Does it help?

Not really. The charm detected a non-pristine device. There are two possibilities:

1. It was non-pristine so the charm did the right thing. i.e. the block device isn't pristine.
2. The charm made a mistake; the block device is pristine, but it wasn't correctly detected as such.

In order to resolve which option is the correct one, we need to establish IF the device was pristine. i.e look at the first block and see if it has any data in it.

Changed in charm-ceph-osd:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack ceph-osd charm because there has been no activity for 60 days.]

Changed in charm-ceph-osd:
status: Incomplete → Expired
Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

How do we check if the first block has any data? Is there a command that we can run when collecting the crashdumps?

Changed in charm-ceph-osd:
status: Expired → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.