Bug #1735122 “Pike + Ceph : Missing ceph-mgr container” : Bugs : kolla-ansible

Revision history for this message

Sebastien (termeau) wrote on 2017-11-29:

#1

Creating a keyring for the manager and starting it in the monitor container fixed both ceph and cinder.

Manual steps:
############
# Enter the ceph-mon container
docker exec -it -u root ceph_mon bash
# Run
ceph --cluster ceph auth get-or-create mgr.XXX mon 'allow profile mgr' osd 'allow *' mds 'allow *' -o /var/lib/ceph/mgr/ceph-XXX/keyring
/usr/bin/ceph-mgr -f --cluster ceph --setuser ceph --setgroup ceph --id XXX

Revision history for this message

Sabbir Sakib (sakibsys) wrote on 2018-02-15:

#2

Download full text (10.3 KiB)

I have been trying to install OpenStack on 3 bare-metal servers using kolla-ansible v6 for some time and running into several issues. Without ceph it works fine though.

Here is the error I'm getting.

TASK [ceph : Getting ceph mgr keyring] ***********************************************************************************************************************************************************************
failed: [oscontroller01.xyz.pvt -> oscontroller01.xyz.pvt] (item=oscontroller01.xyz.pvt) => {"changed": false, "cmd": ["docker", "exec", "ceph_mon", "ceph", "auth", "get-or-create", "mgr.oscontroller01.xyz.pvt", "mon", "allow profile mgr", "osd", "allow *", "mds", "allow *"], "delta": "0:00:00.260286", "end": "2018-02-15 09:55:31.713964", "item": "oscontroller01.xyz.pvt", "msg": "non-zero return code", "rc": 22, "start": "2018-02-15 09:55:31.453678", "stderr": "Error EINVAL: bad entity name", "stderr_lines": ["Error EINVAL: bad entity name"], "stdout": "", "stdout_lines": []}

Here is my globals.yml and multinode files. I will appreciate if you can take a look and let me know If I missed something and declare something wrongly in both files.

globals.yml:
kolla_base_distro: "centos"
kolla_install_type: "binary"
openstack_release: "pike"
kolla_internal_vip_address: "10.88.120.110"
kolla_internal_fqdn: "{{ kolla_internal_vip_address }}"
kolla_external_vip_address: "{{ kolla_internal_vip_address }}"
kolla_external_fqdn: "{{ kolla_external_vip_address }}"
network_interface: "bond0"
kolla_external_vip_interface: "{{ network_interface }}"
api_interface: "{{ network_interface }}"
storage_interface: "{{ network_interface }}"
cluster_interface: "{{ network_interface }}"
tunnel_interface: "{{ network_interface }}"
dns_interface: "{{ network_interface }}"
neutron_external_interface: "bond1"
neutron_plugin_agent: "openvswitch"
openstack_logging_debug: "False"
enable_aodh: "yes"
enable_ceph: "yes"
enable_ceph_rgw: "yes"
enable_cinder: "yes"
enable_fluentd: "yes"
enable_haproxy: "yes"
enable_heat: "yes"
enable_horizon: "yes"
enable_neutron_provider_networks: "yes"
enable_horizon_neutron_lbaas: "{{ enable_neutron_lbaas | bool }}"
enable_neutron_lbaas: "yes"
glance_backend_file: "no"
glance_backend_ceph: "yes"
cinder_backend_ceph: "{{ enable_ceph }}"
nova_backend_ceph: "{{ enable_ceph }}"
nova_compute_virt_type: "kvm"
tempest_image_id:
tempest_flavor_ref_id:
tempest_public_network_id:
tempest_floating_network_name:

multinode file:
# These initial groups are the only groups required to be modified. The
# additional groups are for more control of the environment.
[control]
# These hostname must be resolvable from your deployment host
oscontroller01.xyz.pvt

# The above can also be specified as follows:
#control[01:03] ansible_user=kolla

# The network nodes are where your l3-agent and loadbalancers will run
# This can be the same as a host in the control group
[network]
oscontroller01.xyz.pvt

# inner-compute is the groups of compute nodes which do not have
# external reachability
[inner-compute]
oshyp01.xyz.pvt
oshyp02.xyz.pvt
# external-compute is the groups of compute nodes which can reach
# outside
[external-compute]
oshyp01.xyz.pvt
oshyp02.x...

I have been trying to install OpenStack on 3 bare-metal servers using kolla-ansible v6 for some time and running into several issues. Without ceph it works fine though.

Here is the error I'm getting.

TASK [ceph : Getting ceph mgr keyring] ***********************************************************************************************************************************************************************
failed: [oscontroller01.xyz.pvt -> oscontroller01.xyz.pvt] (item=oscontroller01.xyz.pvt) => {"changed": false, "cmd": ["docker", "exec", "ceph_mon", "ceph", "auth", "get-or-create", "mgr.oscontroller01.xyz.pvt", "mon", "allow profile mgr", "osd", "allow *", "mds", "allow *"], "delta": "0:00:00.260286", "end": "2018-02-15 09:55:31.713964", "item": "oscontroller01.xyz.pvt", "msg": "non-zero return code", "rc": 22, "start": "2018-02-15 09:55:31.453678", "stderr": "Error EINVAL: bad entity name", "stderr_lines": ["Error EINVAL: bad entity name"], "stdout": "", "stdout_lines": []}

Here is my globals.yml and multinode files. I will appreciate if you can take a look and let me know If I missed something and declare something wrongly in both files.

globals.yml: 
kolla_base_distro: "centos"
kolla_install_type: "binary"
openstack_release: "pike"
kolla_internal_vip_address: "10.88.120.110"
kolla_internal_fqdn: "{{ kolla_internal_vip_address }}"
kolla_external_vip_address: "{{ kolla_internal_vip_address }}"
kolla_external_fqdn: "{{ kolla_external_vip_address }}"
network_interface: "bond0"
kolla_external_vip_interface: "{{ network_interface }}"
api_interface: "{{ network_interface }}"
storage_interface: "{{ network_interface }}"
cluster_interface: "{{ network_interface }}"
tunnel_interface: "{{ network_interface }}"
dns_interface: "{{ network_interface }}"
neutron_external_interface: "bond1"
neutron_plugin_agent: "openvswitch"
openstack_logging_debug: "False"
enable_aodh: "yes"
enable_ceph: "yes"
enable_ceph_rgw: "yes"
enable_cinder: "yes"
enable_fluentd: "yes"
enable_haproxy: "yes"
enable_heat: "yes"
enable_horizon: "yes"
enable_neutron_provider_networks: "yes"
enable_horizon_neutron_lbaas: "{{ enable_neutron_lbaas | bool }}"
enable_neutron_lbaas: "yes"
glance_backend_file: "no"
glance_backend_ceph: "yes"
cinder_backend_ceph: "{{ enable_ceph }}"
nova_backend_ceph: "{{ enable_ceph }}"
nova_compute_virt_type: "kvm"
tempest_image_id:
tempest_flavor_ref_id:
tempest_public_network_id:
tempest_floating_network_name:

multinode file:
# These initial groups are the only groups required to be modified. The
# additional groups are for more control of the environment.
[control]
# These hostname must be resolvable from your deployment host
oscontroller01.xyz.pvt

# The above can also be specified as follows:
#control[01:03]     ansible_user=kolla

# The network nodes are where your l3-agent and loadbalancers will run
# This can be the same as a host in the control group
[network]
oscontroller01.xyz.pvt

# inner-compute is the groups of compute nodes which do not have
# external reachability
[inner-compute]
oshyp01.xyz.pvt
oshyp02.xyz.pvt
# external-compute is the groups of compute nodes which can reach
# outside
[external-compute]
oshyp01.xyz.pvt
oshyp02.xyz.pvt

[compute:children]
inner-compute
external-compute

[monitoring]
oscontroller01.xyz.pvt
# When compute nodes and control nodes use different interfaces,
# you need to comment out "api_interface" and other interfaces from the globals.yml
# and specify like below:
#compute01 neutron_external_interface=eth0 api_interface=em1 storage_interface=em1 tunnel_interface=em1

[storage]
oshyp01.xyz.pvt
oshyp02.xyz.pvt

[deployment]
localhost       ansible_connection=local

[baremetal:children]
control
network
compute
storage
monitoring

# You can explicitly specify which hosts run each project by updating the
# groups in the sections below. Common services are grouped together.
[chrony-server:children]
haproxy

[chrony:children]
control
network
compute
storage
monitoring

[collectd:children]
compute

[grafana:children]
monitoring

[etcd:children]
control
compute

[influxdb:children]
monitoring

[karbor:children]
control

[kibana:children]
control

[telegraf:children]
compute
control
monitoring
network
storage

[elasticsearch:children]
control

[haproxy:children]
network

[hyperv]
#hyperv_host

[hyperv:vars]
#ansible_user=user
#ansible_password=password
#ansible_port=5986
#ansible_connection=winrm
#ansible_winrm_server_cert_validation=ignore

[mariadb:children]
control

[rabbitmq:children]
control

[outward-rabbitmq:children]
control

[qdrouterd:children]
control

[mongodb:children]
control

[keystone:children]
control

[glance:children]
control

[nova:children]
control

[neutron:children]
network

[openvswitch:children]
network
compute
manila-share

[opendaylight:children]
network

[cinder:children]
control

[cloudkitty:children]
control

[freezer:children]
control

[memcached:children]
control

[horizon:children]
control

[swift:children]
control

[barbican:children]
control

[heat:children]
control

[murano:children]
control

[solum:children]
control

[ironic:children]
control

[ceph:children]
control

[magnum:children]
control

[sahara:children]
control

[mistral:children]
control

[manila:children]
control

[ceilometer:children]
control

[aodh:children]
control

[congress:children]
control

[panko:children]
control

[gnocchi:children]
control

[tacker:children]
control

[trove:children]
control

# Tempest
[tempest:children]
control

[senlin:children]
control

[vmtp:children]
control

[vitrage:children]
control

[watcher:children]
control

[rally:children]
control

[searchlight:children]
control

[octavia:children]
control

[designate:children]
control

[placement:children]
control

[bifrost:children]
deployment

[zun:children]
control

[skydive:children]
monitoring

[redis:children]
control

[blazar:children]
control

# Additional control implemented here. These groups allow you to control which
# services run on which hosts at a per-service level.
#
# Word of caution: Some services are required to run on the same host to
# function appropriately. For example, neutron-metadata-agent must run on the
# same host as the l3-agent and (depending on configuration) the dhcp-agent.

# Glance
[glance-api:children]
glance

[glance-registry:children]
glance

# Nova
[nova-api:children]
nova

[nova-conductor:children]
nova

[nova-consoleauth:children]
nova

[nova-novncproxy:children]
nova

[nova-scheduler:children]
nova

[nova-spicehtml5proxy:children]
nova

[nova-compute-ironic:children]
nova

[nova-serialproxy:children]
nova

# Neutron
[neutron-server:children]
control

[neutron-dhcp-agent:children]
neutron

[neutron-l3-agent:children]
neutron

[neutron-lbaas-agent:children]
neutron

[neutron-metadata-agent:children]
neutron

[neutron-vpnaas-agent:children]
neutron

[neutron-bgp-dragent:children]
neutron

# Ceph
[ceph-mds:children]
ceph

[ceph-mgr:children]
ceph

[ceph-nfs:children]
ceph

[ceph-mon:children]
ceph

[ceph-rgw:children]
ceph

[ceph-osd:children]
storage

# Cinder
[cinder-api:children]
cinder

[cinder-backup:children]
storage

[cinder-scheduler:children]
cinder

[cinder-volume:children]
storage

# Cloudkitty
[cloudkitty-api:children]
cloudkitty

[cloudkitty-processor:children]
cloudkitty

# Freezer
[freezer-api:children]
freezer

# iSCSI
[iscsid:children]
compute
storage
ironic

[tgtd:children]
storage

# Karbor
[karbor-api:children]
karbor

[karbor-protection:children]
karbor

[karbor-operationengine:children]
karbor

# Manila
[manila-api:children]
manila

[manila-scheduler:children]
manila

[manila-share:children]
network

[manila-data:children]
manila

# Swift
[swift-proxy-server:children]
swift

[swift-account-server:children]
storage

[swift-container-server:children]
storage

[swift-object-server:children]
storage

# Barbican
[barbican-api:children]
barbican

[barbican-keystone-listener:children]
barbican

[barbican-worker:children]
barbican

# Heat
[heat-api:children]
heat

[heat-api-cfn:children]
heat

[heat-engine:children]
heat

# Murano
[murano-api:children]
murano

[murano-engine:children]
murano

# Ironic
[ironic-api:children]
ironic

[ironic-conductor:children]
ironic

[ironic-inspector:children]
ironic

[ironic-pxe:children]
ironic

# Magnum
[magnum-api:children]
magnum

[magnum-conductor:children]
magnum

# Sahara
[sahara-api:children]
sahara

[sahara-engine:children]
sahara

# Solum
[solum-api:children]
solum

[solum-worker:children]
solum

[solum-deployer:children]
solum

[solum-conductor:children]
solum

# Mistral
[mistral-api:children]
mistral

[mistral-executor:children]
mistral

[mistral-engine:children]
mistral

# Ceilometer
[ceilometer-central:children]
ceilometer

[ceilometer-notification:children]
ceilometer

[ceilometer-compute:children]
compute

# Aodh
[aodh-api:children]
aodh

[aodh-evaluator:children]
aodh

[aodh-listener:children]
aodh

[aodh-notifier:children]
aodh

# Congress
[congress-api:children]
congress

[congress-datasource:children]
congress

[congress-policy-engine:children]
congress

# Panko
[panko-api:children]
panko

# Gnocchi
[gnocchi-api:children]
gnocchi

[gnocchi-statsd:children]
gnocchi

[gnocchi-metricd:children]
gnocchi

# Trove
[trove-api:children]
trove

[trove-conductor:children]
trove

[trove-taskmanager:children]
trove

# Multipathd
[multipathd:children]
compute

# Watcher
[watcher-api:children]
watcher

[watcher-engine:children]
watcher

[watcher-applier:children]
watcher

# Senlin
[senlin-api:children]
senlin

[senlin-engine:children]
senlin

# Searchlight
[searchlight-api:children]
searchlight

[searchlight-listener:children]
searchlight

# Octavia
[octavia-api:children]
octavia

[octavia-health-manager:children]
octavia

[octavia-housekeeping:children]
octavia

[octavia-worker:children]
octavia

# Designate
[designate-api:children]
designate

[designate-central:children]
designate

[designate-producer:children]
designate

[designate-mdns:children]
network

[designate-worker:children]
designate

[designate-sink:children]
designate

[designate-backend-bind9:children]
designate

# Placement
[placement-api:children]
placement

# Zun
[zun-api:children]
zun

[zun-compute:children]
compute

# Skydive
[skydive-analyzer:children]
skydive

[skydive-agent:children]
compute
network

# Tacker
[tacker-server:children]
tacker

[tacker-conductor:children]
tacker

# Vitrage
[vitrage-api:children]
vitrage

[vitrage-notifier:children]
vitrage

[vitrage-graph:children]
vitrage

[vitrage-collector:children]
vitrage

[vitrage-ml:children]
vitrage

# Blazar
[blazar-api:children]
blazar

[blazar-manager:children]
blazar

Revision history for this message

Sabbir Sakib (sakibsys) wrote on 2018-02-15:

#3

Also, how did you solve the ceph-mgr contaner issues?

Revision history for this message

Benjamin Bendel (benvandamme) wrote on 2018-08-10:

#4

I have an similar problem in stable/queens. In nine out of ten times the deployment stucks at

TASK [ceph : Getting ceph mgr keyring]

TASK [ceph : Getting ceph mgr keyring] ***************************************************
failed: [dir1.maas -> dir1.maas] (item=dir1.maas) => {"changed": false, "cmd": ["docker", "exec", "ceph_mon", "ceph", "auth", "get-or-create", "mgr.dir1.maas", "mon", "allow profile mgr", "osd", "allow *", "mds", "allow *"], "delta": "0:05:00.327827", "end": "2018-08-09 17:06:08.428261", "item": "dir1.maas", "msg": "non-zero return code", "rc": 1, "start": "2018-08-09 17:01:08.100434", "stderr": "[errno 110] error connecting to the cluster", "stderr_lines": ["[errno 110] error connecting to the cluster"], "stdout": "", "stdout_lines": []}
failed: [dir1.maas -> dir1.maas] (item=dir2.maas) => {"changed": false, "cmd": ["docker", "exec", "ceph_mon", "ceph", "auth", "get-or-create", "mgr.dir2.maas", "mon", "allow profile mgr", "osd", "allow *", "mds", "allow *"], "delta": "0:05:00.299945", "end": "2018-08-09 17:11:09.389523", "item": "dir2.maas", "msg": "non-zero return code", "rc": 1, "start": "2018-08-09 17:06:09.089578", "stderr": "[errno 110] error connecting to the cluster", "stderr_lines": ["[errno 110] error connecting to the cluster"], "stdout": "", "stdout_lines": []}
failed: [dir1.maas -> dir1.maas] (item=dir3.maas) => {"changed": false, "cmd": ["docker", "exec", "ceph_mon", "ceph", "auth", "get-or-create", "mgr.dir3.maas", "mon", "allow profile mgr", "osd", "allow *", "mds", "allow *"], "delta": "0:05:00.325244", "end": "2018-08-09 17:16:10.333002", "item": "dir3.maas", "msg": "non-zero return code", "rc": 1, "start": "2018-08-09 17:11:10.007758", "stderr": "[errno 110] error connecting to the cluster", "stderr_lines": ["[errno 110] error connecting to the cluster"], "stdout": "", "stdout_lines": []}

Revision history for this message

Eduardo Gonzalez (egonzalez90) wrote on 2018-09-28:

#5

Hi, cannot triage the bug, are you still having issues? please provide updated information

Changed in kolla-ansible:
status:	New → Incomplete

Revision history for this message

Sebastien Termeau (st-m) wrote on 2018-09-28:

#6

The issue is that Ubuntu has updated ceph during the life cycle of the OS.
Building ubuntu images now results in a version of ceph not expected by kolla-ansible.
I ended using centos instead and the problem is gone.
I don't think it affects queens.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2018-11-28:

#7

[Expired for kolla-ansible because there has been no activity for 60 days.]

Changed in kolla-ansible:
status:	Incomplete → Expired

kolla-ansible

Pike + Ceph : Missing ceph-mgr container

Bug Description

Other bug subscribers

Remote bug watches