No OSDs has been initialized in random unit with "No block devices detected using current configuration"
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph OSD Charm |
Fix Released
|
High
|
Liam Young | ||
bcache-tools (Ubuntu) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
When deploying a bundle with Ceph and encryption enabled - some random units are failing into "No block devices detected using current configuration", and some units are missing some of OSDs that should be here.
bundle: https:/
juju status: http://
juju config ceph-osd: http://
juju crashdump: https:/
### logs from ceph-osd/0, which has one OSD missing
sosreport: https:/
machine curtin config: http://
juju debug-log ceph-osd/0 --replay: http://
aa-status: http://
lsblk: http://
blkid: http://
# osd log is missing - so it looks like its creations has not been triggered
$ sudo ls -lah /var/log/ceph
total 1.4M
drwxrws--T 2 ceph ceph 4.0K Jan 22 15:40 .
drwxrwxr-x 21 root syslog 4.0K Jan 22 15:17 ..
-rw-r--r-- 1 ceph ceph 93K Jan 22 15:39 ceph-osd.16.log
-rw-r--r-- 1 ceph ceph 91K Jan 22 15:39 ceph-osd.23.log
-rw-r--r-- 1 ceph ceph 96K Jan 22 15:39 ceph-osd.30.log
-rw-r--r-- 1 ceph ceph 96K Jan 22 15:40 ceph-osd.37.log
-rw-r--r-- 1 ceph ceph 96K Jan 22 15:40 ceph-osd.44.log
-rw-r--r-- 1 ceph ceph 94K Jan 22 15:40 ceph-osd.51.log
-rw-r--r-- 1 ceph ceph 94K Jan 22 15:38 ceph-osd.9.log
-rw-r--r-- 1 root ceph 697K Jan 22 15:40 ceph-volume.log
/var/log/ceph from ceph-osd/0: https:/
### logs from ceph-osd/3, which failed to initialize all OSDs
sosreport: https:/
machine curtin config: http://
juju debug-log -i ceph-osd/3 --replay: http://
lsblk: http://
blkid: http://
$ sudo ls -lah /var/log/ceph
total 8.0K
drwxrws--T 2 ceph ceph 4.0K Oct 9 08:27 .
drwxrwxr-x 21 root syslog 4.0K Jan 22 15:17 ..
The root cause _maybe_ is around https:/
At the time of filing this bug system was several hours in "idle" state, but something is still triggering symlinks recreation:
$ ls -lah /dev/disk/by-dname
total 0
drwxr-xr-x 2 root root 400 Jan 22 21:54 .
drwxr-xr-x 9 root root 180 Jan 22 15:09 ..
lrwxrwxrwx 1 root root 13 Jan 22 21:54 bcache0 -> ../../bcache5
lrwxrwxrwx 1 root root 13 Jan 22 21:54 bcache1 -> ../../bcache8
lrwxrwxrwx 1 root root 13 Jan 22 21:54 bcache2 -> ../../bcache9
lrwxrwxrwx 1 root root 13 Jan 22 21:54 bcache3 -> ../../bcache6
lrwxrwxrwx 1 root root 13 Jan 22 21:54 bcache4 -> ../../bcache7
lrwxrwxrwx 1 root root 13 Jan 22 21:54 bcache5 -> ../../bcache1
lrwxrwxrwx 1 root root 13 Jan 22 21:54 bcache6 -> ../../bcache0
lrwxrwxrwx 1 root root 13 Jan 22 21:54 bcache7 -> ../../bcache4
lrwxrwxrwx 1 root root 13 Jan 22 21:54 bcache8 -> ../../bcache3
lrwxrwxrwx 1 root root 13 Jan 22 21:54 bcache9 -> ../../bcache2
$ date
Tue Jan 22 22:01:43 UTC 2019
Changed in charm-ceph-osd: | |
assignee: | nobody → Liam Young (gnuoy) |
importance: | Undecided → Critical |
status: | New → Confirmed |
Changed in charm-ceph-osd: | |
importance: | Critical → High |
status: | In Progress → Incomplete |
Changed in charm-ceph-osd: | |
status: | Incomplete → In Progress |
Changed in charm-ceph-osd: | |
milestone: | none → 19.04 |
Changed in charm-ceph-osd: | |
status: | Fix Committed → Fix Released |
Subscribed field-critical as this blocks us from starting handover preparations.