ceph osd ERROR: osd init failed: (1) Operation not permitted

Bug #1710998 reported by kranthi kiran guttikonda
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openstack-helm
Expired
Undecided
Unassigned

Bug Description

kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.1", GitCommit:"1dc5c66f5dd61da08412a74221ecc79208c2165b", GitTreeState:"clean", BuildDate:"2017-07-14T02:00:46Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3", GitCommit:"2c2fe6e8278a5db2d15a013987b53968c743f2a1", GitTreeState:"clean", BuildDate:"2017-08-03T06:43:48Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

openstack helm - Multinode deployment

Client: &version.Version{SemVer:"v2.5.1", GitCommit:"7cf31e8d9a026287041bae077b09165be247ae66", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.5.1", GitCommit:"7cf31e8d9a026287041bae077b09165be247ae66", GitTreeState:"clean"}

Always fails

kubectl -n ceph get pods
NAME READY STATUS RESTARTS AGE
ceph-mds-4005287839-fv131 1/1 Running 0 5m
ceph-mon-9ntrt 1/1 Running 0 5m
ceph-mon-check-1448197789-x1zxw 1/1 Running 0 5m
ceph-mon-qbsr5 1/1 Running 0 5m
ceph-mon-qw50f 1/1 Running 0 5m
ceph-osd-6kc7c 0/1 CrashLoopBackOff 5 5m
ceph-osd-ltzsq 0/1 CrashLoopBackOff 5 5m

Below are the logs:

kubectl -n ceph logs -f ceph-osd-6kc7c

+ export LC_ALL=C
+ LC_ALL=C
+ source variables_entrypoint.sh
++ ALL_SCENARIOS='populate_kvstore mon osd osd_directory osd_directory_single osd_ceph_disk osd_ceph_disk_prepare osd_ceph_disk_activate osd_ceph_activate_journal mds rgw rgw_user restapi nfs zap_device mon_health'
++ : ceph
++ : ceph-config/ceph
++ :
++ : osd_directory
++ : 1
++ : kube-contrail-compute2
++ : kube-contrail-compute2
++ : /var/lib/ceph/mon/ceph-kube-contrail-compute2
++ : 1
++ : 0
++ : mds-kube-contrail-compute2
++ : 0
++ : 100
++ : 0
++ : 0
+++ uuidgen
++ : 4190ac7b-f396-4d57-82d4-90cd3b234b32
+++ uuidgen
++ : c296ef19-b966-454b-a193-5f76c0fff260
++ : root=default host=kube-contrail-compute2
++ : 0
++ : cephfs
++ : cephfs_data
++ : 8
++ : cephfs_metadata
++ : 8
++ : kube-contrail-compute2
++ :
++ :
++ : 8080
++ : 0
++ : 9000
++ : 0.0.0.0
++ : cephnfs
++ : 0.0.0.0
++ : 5000
++ : /api/v0.1
++ : warning
++ : /var/log/ceph/ceph-restapi.log
++ : k8s
++ : 127.0.0.1
++ : 4001
++ :
++ :
++ CLI_OPTS='--cluster ceph'
++ DAEMON_OPTS='--cluster ceph --setuser ceph --setgroup ceph -d'
++ MOUNT_OPTS='-t xfs -o noatime,inode64'
++ ETCDCTL_OPTS='--peers 127.0.0.1:4001'
++ [[ k8s == \e\t\c\d ]]
++ MDS_KEYRING=/var/lib/ceph/mds/ceph-mds-kube-contrail-compute2/keyring
++ ADMIN_KEYRING=/etc/ceph/ceph.client.admin.keyring
++ MON_KEYRING=/etc/ceph/ceph.mon.keyring
++ RGW_KEYRING=/var/lib/ceph/radosgw/kube-contrail-compute2/keyring
++ MDS_BOOTSTRAP_KEYRING=/var/lib/ceph/bootstrap-mds/ceph.keyring
++ RGW_BOOTSTRAP_KEYRING=/var/lib/ceph/bootstrap-rgw/ceph.keyring
++ OSD_BOOTSTRAP_KEYRING=/var/lib/ceph/bootstrap-osd/ceph.keyring
++ OSD_PATH_BASE=/var/lib/ceph/osd/ceph
++ MONMAP=/etc/ceph/monmap-ceph
+ source common_functions.sh
++ set -ex
+ source debug.sh
++ set -e
+++ comma_to_space
+++ echo
+ case "$KV_TYPE" in
+ source /config.k8s.sh
++ set -e
++ to_lowercase osd_directory
++ echo osd_directory
+ CEPH_DAEMON=osd_directory
+ create_mandatory_directories
+ for keyring in '$OSD_BOOTSTRAP_KEYRING' '$MDS_BOOTSTRAP_KEYRING' '$RGW_BOOTSTRAP_KEYRING'
++ dirname /var/lib/ceph/bootstrap-osd/ceph.keyring
+ mkdir -p /var/lib/ceph/bootstrap-osd
+ for keyring in '$OSD_BOOTSTRAP_KEYRING' '$MDS_BOOTSTRAP_KEYRING' '$RGW_BOOTSTRAP_KEYRING'
++ dirname /var/lib/ceph/bootstrap-mds/ceph.keyring
+ mkdir -p /var/lib/ceph/bootstrap-mds
+ for keyring in '$OSD_BOOTSTRAP_KEYRING' '$MDS_BOOTSTRAP_KEYRING' '$RGW_BOOTSTRAP_KEYRING'
++ dirname /var/lib/ceph/bootstrap-rgw/ceph.keyring
+ mkdir -p /var/lib/ceph/bootstrap-rgw
+ for directory in mon osd mds radosgw tmp mgr
+ mkdir -p /var/lib/ceph/mon
+ for directory in mon osd mds radosgw tmp mgr
+ mkdir -p /var/lib/ceph/osd
+ for directory in mon osd mds radosgw tmp mgr
+ mkdir -p /var/lib/ceph/mds
+ for directory in mon osd mds radosgw tmp mgr
+ mkdir -p /var/lib/ceph/radosgw
+ for directory in mon osd mds radosgw tmp mgr
+ mkdir -p /var/lib/ceph/tmp
+ for directory in mon osd mds radosgw tmp mgr
+ mkdir -p /var/lib/ceph/mgr
+ mkdir -p /var/lib/ceph/mon/ceph-kube-contrail-compute2
+ mkdir -p /var/run/ceph
+ mkdir -p /var/lib/ceph/radosgw/kube-contrail-compute2
+ mkdir -p /var/lib/ceph/mds/ceph-mds-kube-contrail-compute2
+ mkdir -p /var/lib/ceph/mgr/ceph-
+ chown -R ceph. /var/run/ceph/ /var/lib/ceph/bootstrap-mds /var/lib/ceph/bootstrap-osd /var/lib/ceph/bootstrap-rgw /var/lib/ceph/mds /var/lib/ceph/mgr /var/lib/ceph/mon /var/lib/ceph/osd /var/lib/ceph/radosgw /var/lib/ceph/tmp
+ case "$CEPH_DAEMON" in
+ source start_osd.sh
++ set -ex
++ is_redhat
++ get_package_manager
++ is_available rpm
++ command -v rpm
++ is_available dpkg
++ command -v dpkg
++ OS_VENDOR=ubuntu
++ [[ ubuntu == \r\e\d\h\a\t ]]
++ is_ubuntu
++ get_package_manager
++ is_available rpm
++ command -v rpm
++ is_available dpkg
++ command -v dpkg
++ OS_VENDOR=ubuntu
++ [[ ubuntu == \u\b\u\n\t\u ]]
++ source /etc/default/ceph
+++ TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728
+ OSD_TYPE=directory
+ start_osd
+ get_config
+ log 'k8s: config is stored as k8s secrets.'
+ '[' -z 'k8s: config is stored as k8s secrets.' ']'
++ date '+%F %T'
+ TIMESTAMP='2017-08-15 23:42:59'
+ echo '2017-08-15 23:42:59 /entrypoint.sh: k8s: config is stored as k8s secrets.'
+ return 0
+ check_config
+ [[ ! -e /etc/ceph/ceph.conf ]]
+ '[' 1 -eq 1 ']'
+ get_admin_key
+ log 'k8s: does not generate the admin key. Use Kubernetes secrets instead.'
+ '[' -z 'k8s: does not generate the admin key. Use Kubernetes secrets instead.' ']'
2017-08-15 23:42:59 /entrypoint.sh: k8s: config is stored as k8s secrets.
++ date '+%F %T'
+ TIMESTAMP='2017-08-15 23:42:59'
+ echo '2017-08-15 23:42:59 /entrypoint.sh: k8s: does not generate the admin key. Use Kubernetes secrets instead.'
+ return 0
+ check_admin_key
+ [[ ! -e /etc/ceph/ceph.client.admin.keyring ]]
+ case "$OSD_TYPE" in
+ source osd_directory.sh
2017-08-15 23:42:59 /entrypoint.sh: k8s: does not generate the admin key. Use Kubernetes secrets instead.
++ set -ex
+ source osd_common.sh
+ osd_directory
+ [[ ! -d /var/lib/ceph/osd ]]
+ '[' -z kube-contrail-compute2 ']'
++ find /var/lib/ceph/osd -prune -empty
+ [[ -n '' ]]
+ mkdir -p /etc/forego/ceph
+ echo ''
++ ls /var/lib/ceph/osd
++ sed 's/.*-//'
+ for OSD_ID in '$(ls /var/lib/ceph/osd | sed '\''s/.*-//'\'')'
++ get_osd_path 0
++ echo /var/lib/ceph/osd/ceph-0/
+ OSD_PATH=/var/lib/ceph/osd/ceph-0/
+ OSD_KEYRING=/var/lib/ceph/osd/ceph-0//keyring
+ '[' -n '' ']'
+ '[' -n '' ']'
+ OSD_J=/var/lib/ceph/osd/ceph-0//journal
+ '[' '!' -e /var/lib/ceph/osd/ceph-0//keyring ']'
+ echo 'ceph-0: /usr/bin/ceph-osd --cluster ceph -f -i 0 --osd-journal /var/lib/ceph/osd/ceph-0//journal -k /var/lib/ceph/osd/ceph-0//keyring'
+ tee -a /etc/forego/ceph/Procfile
ceph-0: /usr/bin/ceph-osd --cluster ceph -f -i 0 --osd-journal /var/lib/ceph/osd/ceph-0//journal -k /var/lib/ceph/osd/ceph-0//keyring
+ log SUCCESS
+ '[' -z SUCCESS ']'
++ date '+%F %T'
2017-08-15 23:42:59 /entrypoint.sh: SUCCESS
+ TIMESTAMP='2017-08-15 23:42:59'
+ echo '2017-08-15 23:42:59 /entrypoint.sh: SUCCESS'
+ return 0
+ start_forego
+ exec /usr/local/bin/forego start -f /etc/forego/ceph/Procfile
forego | starting ceph-0.1 on port 5000
ceph-0.1 | starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0//journal
ceph-0.1 | 2017-08-15 23:42:59.266204 7f46027e48c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
ceph-0.1 | 2017-08-15 23:42:59.366827 7f46027e48c0 -1 osd.0 173 log_to_monitors {default=true}
ceph-0.1 | 2017-08-15 23:42:59.377318 7f46027e48c0 -1 ** ERROR: osd init failed: (1) Operation not permitted

Tags: ceph-osd
Revision history for this message
kranthi kiran guttikonda (kranthi-guttikonda9) wrote :

This issue is reproducible by following the steps

Deploy helm chart ceph. It will deploy ceph.osd
Delete helm delete --purge ceph
Try to deploy ceph once again and will hit the mentioned error

Revision history for this message
Alan Meadows (alan-meadows) wrote :

This can sometimes occur when previous ceph cluster data is left behind in /var/lib/openstack-helm from previous instantiations of ceph. The authentication details from the prior the ceph installs can hang around preventing the osd from registering with the monitors.

If this is an empty cluster please try purging /var/lib/openstack-helm across all physical hosts participating in ceph.

Revision history for this message
kranthi kiran guttikonda (kranthi-guttikonda9) wrote :

Looks like it works after deleting /var/lib/openstack-helm. But user will always face the problem when the node goes up/down or when the pod restarts. Every time we have to delete the folder and it's not a good idea. Is it possible to integrate the solution before starting the container (i.e., in scripts)?

Revision history for this message
Vijay Kamisetty (vkamiset) wrote :
Download full text (13.4 KiB)

I do see the same issue

root@node-cs4:/opt/openstack-helm# kubectl describe pods ceph-osd-default-83945928-mldmp -n ceph
Name: ceph-osd-default-83945928-mldmp
Namespace: ceph
Node: node-cs8/10.13.134.39
Start Time: Wed, 03 Oct 2018 17:50:09 -0400
Labels: application=ceph
                component=osd
                controller-revision-hash=4150110849
                pod-template-generation=1
                release_group=ceph
Annotations: configmap-etc-hash=fd0b7c2e53c9ee26c0660d32730598697f8c8191ff40f1871aa78df9a2b222a1
Status: Running
IP: 10.13.134.39
Controlled By: DaemonSet/ceph-osd-default-83945928
Init Containers:
  init:
    Container ID: docker://a27a1564b904f23ab1c29c13a6f45d03a0db32a1dd21e75738b2c5023d8ac74b
    Image: quay.io/stackanetes/kubernetes-entrypoint:v0.3.0
    Image ID: docker-pullable://quay.io/stackanetes/kubernetes-entrypoint@sha256:d08996845ebf117641de89457f28e24c88a15c7bdbb8f026f91e7210f7f9bd11
    Port: <none>
    Command:
      kubernetes-entrypoint
    State: Terminated
      Reason: Completed
      Exit Code: 0
      Started: Thu, 04 Oct 2018 09:28:26 -0400
      Finished: Thu, 04 Oct 2018 09:29:00 -0400
    Ready: True
    Restart Count: 1
    Environment:
      JOURNAL_LOCATION: /var/lib/openstack-helm/ceph/osd/journal-one
      STORAGE_LOCATION: /var/lib/openstack-helm/ceph/osd/osd-one
      JOURNAL_TYPE: directory
      STORAGE_TYPE: directory
      POD_NAME: ceph-osd-default-83945928-mldmp (v1:metadata.name)
      NAMESPACE: ceph (v1:metadata.namespace)
      INTERFACE_NAME: eth0
      PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/
      DEPENDENCY_SERVICE: ceph:ceph-mon
      DEPENDENCY_JOBS: ceph-storage-keys-generator,ceph-osd-keyring-generator
      DEPENDENCY_DAEMONSET:
      DEPENDENCY_CONTAINER:
      DEPENDENCY_POD:
      COMMAND: echo done
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from ceph-osd-token-2ls8l (ro)
  ceph-init-dirs:
    Container ID: docker://1a15487a2d546c3355e70bd28397374874431d7e3ff2145e05a61f8c027bf230
    Image: docker.io/ceph/daemon:tag-build-master-luminous-ubuntu-16.04
    Image ID: docker-pullable://ceph/daemon@sha256:687056228e899ecbfd311854e3864db0b46dd4a9a6d4eb4b47c815ca413f25ee
    Port: <none>
    Command:
      /tmp/init-dirs.sh
    State: Terminated
      Reason: Completed
      Exit Code: 0
      Started: Thu, 04 Oct 2018 09:29:02 -0400
      Finished: Thu, 04 Oct 2018 09:29:02 -0400
    Ready: True
    Restart Count: 0
    Environment:
      JOURNAL_LOCATION: /var/lib/openstack-helm/ceph/osd/journal-one
      STORAGE_LOCATION: /var/lib/openstack-helm/ceph/osd/osd-one
      JOURNAL_TYPE: directory
      STORAGE_TYPE: directory
      CLUSTER: ceph
    Mounts:
      /run from pod-run (rw)
      /tmp/init-dirs.sh from ceph-bin (ro)
      /var/lib/ceph from pod-var-lib-ceph (rw)
      /var/run/secrets/kubernetes.io/ser...

Revision history for this message
Gage Hugo (gagehugo) wrote :

Has there been any updates regarding this?

Changed in openstack-helm:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for openstack-helm because there has been no activity for 60 days.]

Changed in openstack-helm:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.