Comment 9 for bug 1827936

Revision history for this message
Tingjie Chen (silverhandy) wrote :

It is hard for me to reproduce the kernel hang issue, and I killed ceph-osd process for many times but the thread will recover after seconds.

The log:
--------------------
2019-05-06T18:30:05.512 controller-0 kernel: warning [ 4538.906310] libceph: osd0 192.168.204.3:6800 socket error on write
2019-05-06T18:30:29.450 controller-0 kernel: err [ 4562.830603] INFO: task jbd2/rbd0-8:126770 blocked for more than 120 seconds.
...
2019-05-06T18:30:29.467 controller-0 kernel: err [ 4562.847516] INFO: task jbd2/rbd1-8:126832 blocked for more than 120 seconds.

this log means process: jbd2/rbd0-8 didn’t leave uninterruptible sleep after this mark. This state can be caused by waiting for disk IO, by vfork() and many other cases.
jbd2/rbd0-8 is ext4 journal thread for ceph: Distributed Replicated Block Device (DRBD, it is /dev/drbd0 in my env), so it is may occurred in waiting for IO when kill ceph-osd thread.

Maria, when you met the hang, have you deploy containerized openstack application helm tarball?

BTW: for Khalil, actually I am not remove tags of yours, maybe because I add the tag: stx.storage but not refresh the page and that time you has added several tags already :)