Kernel warnings about jbd2/rbd0-* task blocked are to be expected because this system is running with a single OSD in replication 1 and killing ceph-osd blocks any client using Ceph including mariadb container:
- rbd0 is mounted to /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/kube-rbd-image-kubernetes-dynamic-pvc-765f2082-701c-11e9-8184-9662bc166b57.
- rbd1 is mounted to /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/kube-rbd-image-kubernetes-dynamic-pvc-a312923d-701c-11e9-8184-9662bc166b57
- mysqld (mariadb) is also blocked:
2019-05-06T18:30:29.604 controller-0 kernel: err [ 4562.873872] INFO: task mysqld:249372 blocked for more than 120 seconds.
The are some questions here:
1. is why is the system unresponsive after killing ceph-osd?
2. what is the exact time stamp when this happened? There are kernel log messages from the claimed hang at 2019-05-06T18:30:29.467 up to 2019-05-06T19:17:06.914 when another kernel instance starts to boot (system was restarted) which means kernel is not really hanged
3. after reboot ceph-osd is killed again 3 times then logs are collected without restarting the system again. Why did it not hang this time?
Kernel warnings about jbd2/rbd0-* task blocked are to be expected because this system is running with a single OSD in replication 1 and killing ceph-osd blocks any client using Ceph including mariadb container: kubelet/ plugins/ kubernetes. io/rbd/ mounts/ kube-rbd- image-kubernete s-dynamic- pvc-765f2082- 701c-11e9- 8184-9662bc166b 57. kubelet/ plugins/ kubernetes. io/rbd/ mounts/ kube-rbd- image-kubernete s-dynamic- pvc-a312923d- 701c-11e9- 8184-9662bc166b 57 05-06T18: 30:29.604 controller-0 kernel: err [ 4562.873872] INFO: task mysqld:249372 blocked for more than 120 seconds.
- rbd0 is mounted to /var/lib/
- rbd1 is mounted to /var/lib/
- mysqld (mariadb) is also blocked:
2019-
The are some questions here:
1. is why is the system unresponsive after killing ceph-osd?
2. what is the exact time stamp when this happened? There are kernel log messages from the claimed hang at 2019-05- 06T18:30: 29.467 up to 2019-05- 06T19:17: 06.914 when another kernel instance starts to boot (system was restarted) which means kernel is not really hanged
3. after reboot ceph-osd is killed again 3 times then logs are collected without restarting the system again. Why did it not hang this time?