Comment 10 for bug 1732523

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

to QA: could you include systemctl status <svc> for all services to reports?

to charmers:

Error message could certainly be improved here:

https://github.com/openstack/charm-ceph-osd/blob/stable/17.08/hooks/ceph_hooks.py#L524-L527

"No block devices detected using current configuration" due not not running ceph-osd processes? I think this is misleading at best.

https://github.com/openstack/charm-ceph-osd/blob/stable/17.08/lib/ceph/utils.py#L1476-L1483

All of the ceph-osd units have osdized (dev nodes are present):

http://paste.ubuntu.com/25975594/ (note that --filestore has been passed explicitly)

Lots of ceph-osd service failures (see below)

http://paste.ubuntu.com/25975636/

ceph-osd_4/var/log/syslog:Nov 15 16:39:35 geodude systemd[1]: ceph-osd@0.service: Failed with result 'exit-code'.
ceph-osd_4/var/log/syslog:Nov 15 16:39:56 geodude systemd[1]: ceph-osd@0.service: Failed with result 'exit-code'

=======

I think it is just an apparmor policy issue:

https://github.com/openstack/charm-ceph-osd/blob/stable/17.08/hooks/ceph_hooks.py#L524-L527

apparmor DENIED messages -> 3 bugs (need to update apparmor policy for nova-compute and ceph-osd, neutron):

Nov 15 16:53:52 elgyem kernel: [ 2941.797804] audit: type=1400 audit(1510764832.105:95): apparmor="DENIED" operation="open" profile="/usr/bin/ceph-osd" name="/proc/1201808/auxv" pid=1201808 comm="ceph-osd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Nov 15 16:53:52 elgyem kernel: [ 2941.805408] audit: type=1400 audit(1510764832.113:96): apparmor="DENIED" operation="open" profile="/usr/bin/ceph-osd" name="/proc/1201810/auxv" pid=1201810 comm="ceph-osd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Nov 15 16:53:52 elgyem kernel: [ 2941.896567] audit: type=1400 audit(1510764832.204:97): apparmor="DENIED" operation="open" profile="/usr/bin/ceph-osd" name="/proc/1201880/auxv" pid=1201880 comm="ceph-osd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Nov 15 16:53:52 elgyem kernel: [ 2942.602620] audit: type=1400 audit(1510764832.910:98): apparmor="DENIED" operation="open" profile="/usr/bin/ceph-osd" name="/proc/1202211/auxv" pid=1202211 comm="ceph-osd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Nov 15 16:53:54 elgyem kernel: [ 2943.887116] audit: type=1400 audit(1510764834.194:99): apparmor="DENIED" operation="open" profile="/usr/bin/ceph-osd" name="/proc/1202815/auxv" pid=1202815 comm="ceph-osd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0

..

===

Nov 15 16:53:49 elgyem kernel: [ 2939.253295] audit: type=1400 audit(1510764829.561:91): apparmor="DENIED" operation="open" profile="/usr/bin/ceph-osd" name="/proc/1200153/auxv" pid=1200153 comm="ceph-osd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0

Nov 15 16:47:48 elgyem kernel: [ 2577.896060] audit: type=1400 audit(1510764468.209:69): apparmor="DENIED" operation="open" profile="/usr/bin/nova-compute" name="/proc/1018795/status" pid=1018795 comm="nova-compute" requested_mask="r" denied_mask="r" fsuid=114 ouid=114
Nov 15 16:47:48 elgyem kernel: [ 2577.896064] audit: type=1400 audit(1510764468.209:70): apparmor="DENIED" operation="open" profile="/usr/bin/nova-compute" name="/proc/1018795/mounts" pid=1018795 comm="nova-compute" requested_mask="r" denied_mask="r" fsuid=114 ouid=114

Nov 15 16:49:43 ralts kernel: [ 2368.834214] audit: type=1400 audit(1510764583.903:106): apparmor="DENIED" operation="open" profile="/usr/bin/neutron-openvswitch-agent" name="/etc/inputrc" pid=855886 comm="neutron-openvsw" requested_mask="r" denied_mask="r" fsuid
=115 ouid=0
Nov 15 16:49:44 ralts kernel: [ 2369.045149] audit: type=1400 audit(1510764584.113:107): apparmor="DENIED" operation="open" profile="/usr/bin/neutron-l3-agent" name="/etc/inputrc" pid=855932 comm="neutron-l3-agen" requested_mask="r" denied_mask="r" fsuid=115 ouid
=0
Nov 15 16:49:45 ralts kernel: [ 2370.017694] audit: type=1400 audit(1510764585.086:108): apparmor="DENIED" operation="open" profile="/usr/bin/neutron-openvswitch-agent" name="/usr/share/python-wheels/" pid=855886 comm="neutron-openvsw" requested_mask="r" denied_m
ask="r" fsuid=115 ouid=0

====

I don't think it's a hardware issue. DID_SOFT_ERROR could indicate something but not likely.

http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/tree/include/scsi/scsi.h?h=Ubuntu-hwe-4.13.0-16.19_16.04.3#n144

#define DID_SOFT_ERROR 0x0b /* The low level driver just wish a retry */

Nov 15 16:53:50 elgyem kernel: [ 2940.127691] sd 0:0:2:0: [sdb] tag#0 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_SENSE
Nov 15 16:53:50 elgyem kernel: [ 2940.127695] sd 0:0:2:0: [sdb] tag#0 Sense Key : Aborted Command [current]
Nov 15 16:53:50 elgyem kernel: [ 2940.127696] sd 0:0:2:0: [sdb] tag#0 Add. Sense: No additional sense information
Nov 15 16:53:50 elgyem kernel: [ 2940.127699] sd 0:0:2:0: [sdb] tag#0 CDB: ATA command pass through(16) 85 06 20 00 05 00 fe 00 00 00 00 00 00 40 ef 00

Nov 15 16:54:12 elgyem kernel: [ 2961.868073] sd 0:0:1:0: [sda] tag#0 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_SENSE
Nov 15 16:54:12 elgyem kernel: [ 2961.868076] sd 0:0:1:0: [sda] tag#0 Sense Key : Aborted Command [current]
Nov 15 16:54:12 elgyem kernel: [ 2961.868078] sd 0:0:1:0: [sda] tag#0 Add. Sense: No additional sense information
Nov 15 16:54:12 elgyem kernel: [ 2961.868081] sd 0:0:1:0: [sda] tag#0 CDB: ATA command pass through(16) 85 06 20 00 05 00 fe 00 00 00 00 00 00 40 ef 00
Nov 15 16:54:12 elgyem kernel: [ 2961.882153] sd 0:0:2:0: [sdb] tag#2 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_SENSE
Nov 15 16:54:12 elgyem kernel: [ 2961.882157] sd 0:0:2:0: [sdb] tag#2 Sense Key : Aborted Command [current]
Nov 15 16:54:12 elgyem kernel: [ 2961.882159] sd 0:0:2:0: [sdb] tag#2 Add. Sense: No additional sense information
Nov 15 16:54:12 elgyem kernel: [ 2961.882162] sd 0:0:2:0: [sdb] tag#2 CDB: ATA command pass through(16) 85 06 20 00 05 00 fe 00 00 00 00 00 00 40 ef 00