Comment 7 for bug 1881747

Revision history for this message
Martin Strange (mstrange) wrote :

For what it's worth, I've now had the exact same problem, which led me here.

On a bare-metal 20.04 using full blank HDDs as OSDs (/dev/sda etc.), installing using cephadm worked fine with an XFS root, but later on when I reinstalled and tried ZFS root, I then got the same behaviour described above despite trying device zaps and everything I can think of.

It seems that the unit.run does two separate steps, first a "/usr/sbin/ceph-volume lvm activate 0" and then a "/usr/bin/ceph-osd -n osd.0"

The activate does its work inside a tmpfs "/var/lib/ceph/osd/ceph-0", which is entirely thrown away when that container ends, so the symlink "/var/lib/ceph/osd/ceph-0/block" it creates is gone before the ceph-osd container starts up, resulting it in not finding a "block" any more and then declaring unknown type because of that.

I don't understand how that could ever possibly work, so maybe the ZFS root is not relevant, or maybe it somehow causes activate to use the tmpfs?

Note that if I run a single container manually, and do the same activate followed by running ceph-osd then the OSD does come up.

How is the "/var/lib/ceph/osd/ceph-0/block" meant to persist between running the activate in one container and then running the ceph-osd in a different one afterwards, or is the "/usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0" it does during activate that is somehow the source of this problem?