StarlingX

Bug #1827936
Comment #5

Comment 5 for bug 1827936

Revision history for this message

Daniel Badea (daniel.badea) wrote on 2019-05-08:

Kernel warnings about jbd2/rbd0-* task blocked are to be expected because this system is running with a single OSD in replication 1 and killing ceph-osd blocks any client using Ceph including mariadb container:
- rbd0 is mounted to /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/kube-rbd-image-kubernetes-dynamic-pvc-765f2082-701c-11e9-8184-9662bc166b57.
- rbd1 is mounted to /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/kube-rbd-image-kubernetes-dynamic-pvc-a312923d-701c-11e9-8184-9662bc166b57
- mysqld (mariadb) is also blocked:
2019-05-06T18:30:29.604 controller-0 kernel: err [ 4562.873872] INFO: task mysqld:249372 blocked for more than 120 seconds.

The are some questions here:

1. is why is the system unresponsive after killing ceph-osd?

2. what is the exact time stamp when this happened? There are kernel log messages from the claimed hang at 2019-05-06T18:30:29.467 up to 2019-05-06T19:17:06.914 when another kernel instance starts to boot (system was restarted) which means kernel is not really hanged

3. after reboot ceph-osd is killed again 3 times then logs are collected without restarting the system again. Why did it not hang this time?