Deployment fails with 3 Ceph mon

Bug #1740398 reported by dirane
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
Won't Fix
Undecided
Unassigned

Bug Description

Overview
Hi
I'm trying to deploy OpenStack with three controllers nodes. But I'm facing an issue went kolla bootstrap Ceph container. The deployment fails and OSD are not correctly created

Steps to reproduce
Deployment with three controllers nodes (ceph enabled)

Expected results
Actual results
I have found those logs :

-> on compute

  *ceph-client.admin.log :
monclient(hunting): authenticate timed out after 300
librados: client.admin authentication error (110) Connection timed out

  * docker ps -a :
aa48da67a3c7 private-registry/centos-binary-ceph-osd:5.0.0 "kolla_start" 45 minutes ago Exited (1) 40 minutes ago bootstrap_osd_5
6d12c93c668b private-registry/centos-binary-ceph-osd:5.0.0 "kolla_start" About an hour ago Exited (1) 45 minutes ago bootstrap_osd_4
100e9f6c436b private-registry/centos-binary-ceph-osd:5.0.0 "kolla_start" About an hour ago Exited (1) About an hour ago bootstrap_osd_3
6f82eb8ae898 private-registry/centos-binary-ceph-osd:5.0.0 "kolla_start" About an hour ago Exited (1) About an hour ago bootstrap_osd_2
6b909d87ced2 private-registry/centos-binary-ceph-osd:5.0.0 "kolla_start" About an hour ago Exited (1) About an hour ago bootstrap_osd_1
819914f446fb private-registry/centos-binary-ceph-osd:5.0.0 "kolla_start" About an hour ago Exited (1) About an hour ago bootstrap_osd_0

 * docker logs bootstrap_osd_3
INFO:main:Loading config file at /var/lib/kolla/config_files/config.json
INFO:main:Validating config file
INFO:main:Kolla config strategy set to: COPY_ALWAYS
INFO:main:Copying service configuration files
INFO:main:Copying /var/lib/kolla/config_files/ceph.conf to /etc/ceph/ceph.conf
INFO:main:Setting permission for /etc/ceph/ceph.conf
INFO:main:Copying /var/lib/kolla/config_files/ceph.client.admin.keyring to /etc/ceph/ceph.client.admin.keyring
INFO:main:Setting permission for /etc/ceph/ceph.client.admin.keyring
INFO:main:Writing out command to execute
Error connecting to cluster: TimedOut

-> On controller ceph-mon.X.X.X.X.log
2017-12-28 11:09:21.206387 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:09:21.206487 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(439) init, last seen epoch 439
2017-12-28 11:09:36.237530 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:09:36.237600 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(441) init, last seen epoch 441
2017-12-28 11:09:51.276675 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:09:51.276774 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(443) init, last seen epoch 443
2017-12-28 11:10:06.313344 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:10:06.313438 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(445) init, last seen epoch 445
2017-12-28 11:10:18.817119 7f116b6bb700 0 mon.X.X.X.X@2(electing).data_health(0) update_stats avail 94% total 329 GB, used 407 MB, avail 312 GB
2017-12-28 11:10:21.350803 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:10:21.350913 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(447) init, last seen epoch 447
2017-12-28 11:10:36.385019 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:10:36.385101 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(449) init, last seen epoch 449
2017-12-28 11:10:51.424834 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:10:51.424952 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(451) init, last seen epoch 451
2017-12-28 11:11:06.459091 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:11:06.459160 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(453) init, last seen epoch 453
2017-12-28 11:11:18.817242 7f116b6bb700 0 mon.X.X.X.X@2(electing).data_health(0) update_stats avail 94% total 329 GB, used 408 MB, avail 312 GB
2017-12-28 11:11:21.499690 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:11:21.499799 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(455) init, last seen epoch 455
2017-12-28 11:11:36.534253 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:11:36.534364 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(457) init, last seen epoch 457
2017-12-28 11:11:51.577507 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:11:51.577589 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(459) init, last seen epoch 459
2017-12-28 11:12:06.613963 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:12:06.614036 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(461) init, last seen epoch 461
2017-12-28 11:12:18.817438 7f116b6bb700 0 mon.X.X.X.X@2(electing).data_health(0) update_stats avail 94% total 329 GB, used 410 MB, avail 312 GB
2017-12-28 11:12:21.654153 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:12:21.654244 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(463) init, last seen epoch 463
2017-12-28 11:12:36.696281 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:12:36.696370 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(465) init, last seen epoch 465
2017-12-28 11:12:51.736531 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:12:51.736625 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(467) init, last seen epoch 467
2017-12-28 11:13:06.771501 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:13:06.771587 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(469) init, last seen epoch 469
2017-12-28 11:13:18.817649 7f116b6bb700 0 mon.X.X.X.X@2(electing).data_health(0) update_stats avail 94% total 329 GB, used 411 MB, avail 312 GB
2017-12-28 11:13:21.810869 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:13:21.810984 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(471) init, last seen epoch 471
2017-12-28 11:13:36.851396 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:13:36.851495 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(473) init, last seen epoch 473
2017-12-28 11:13:51.890575 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:13:51.890661 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(475) init, last seen epoch 475
2017-12-28 11:14:06.930553 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:14:06.930642 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(477) init, last seen epoch 477
2017-12-28 11:14:18.817921 7f116b6bb700 0 mon.X.X.X.X@2(electing).data_health(0) update_stats avail 94% total 329 GB, used 412 MB, avail 312 GB
2017-12-28 11:14:21.971071 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:14:21.971178 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(479) init, last seen epoch 479
2017-12-28 11:14:37.012441 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:14:37.012516 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(481) init, last seen epoch 481
2017-12-28 11:14:52.057103 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:14:52.057181 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(483) init, last seen epoch 483
2017-12-28 11:15:07.092539 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:15:07.092627 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(485) init, last seen epoch 485
2017-12-28 11:15:18.818098 7f116b6bb700 0 mon.X.X.X.X@2(electing).data_health(0) update_stats avail 94% total 329 GB, used 414 MB, avail 312 GB
2017-12-28 11:15:22.124296 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:15:22.124368 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(487) init, last seen epoch 487
2017-12-28 11:15:37.162572 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:15:37.162664 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(489) init, last seen epoch 489

-> on Kayobe node (during kayobe overcloud service deploy )
fatal: [ctrl03 -> X.X.X.X]: FAILED! => {"changed": false, "cmd": ["docker", "exec", "ceph_mon", "ceph", "auth", "get-or-create", "client.glance", "mon", "allow r", "osd", "allow class-read object_prefix rbd_children, allow rwx pool=images, allow rwx pool=images-cache"], "delta": "0:05:00.186101", "end": "2017-12-28 11:30:29.577681", "failed": true, "rc": 1, "start": "2017-12-28 11:25:29.391580", "stderr": "Error connecting to cluster: TimedOut", "stderr_lines": ["Error connecting to cluster: TimedOut"], "stdout": "", "stdout_lines": []}

PS : I've hound similar issue here https://bugs.launchpad.net/kolla/+bug/1629237

Thank you in advance for your help

Revision history for this message
dirane (diranetafen) wrote :

Hi,
I've found a workaround, but I still not know why my change makes it work.
In fact, I'm using a deployment tools calls Kayobe which enable me to easily deploy OpenStack.
An inventory file is generated automatically, and the deployment is based on it.
Then went I change the node order (list of node) the deployment work fine, but if i change it no deployment no longer works.
I'm thinking that Kolla try to bootstrap Ceph cluster using predefined order (in ceph.conf, with the list of node) which is not the real order (the one provided by my inventory file).

Somebody has an explanation

Regards
Dirane TAFEN

affects: kolla → kolla-ansible
Mark Goddard (mgoddard)
Changed in kolla-ansible:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.