[network] Ceph monitor doesn't bootstrap in a standalone deployment

Bug #1824993 reported by Francesco Pantano
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Incomplete
High
Unassigned

Bug Description

Deploying a Ceph Nautilus cluster in a rdo standalone scenario, an issue appear during the monitor bootstrap phase.
In particular, using the standard network config proposed and supported, the mon is unable to start election and form the quorum because it's stuck with the following messages:

7fb411007700 0 -- [v2:192.168.24.1:3300/0,v1:192.168.24.1:6789/0] send_to message mon_probe(probe 4b5c8c0a-ff60-454b-a1b4-9747aa737d19 name s
tandalone new mon_release 14) v7 with empty dest
debug 2019-04-09 07:47:25.826 7fb411007700 0 -- [v2:192.168.24.1:3300/0,v1:192.168.24.1:6789/0] send_to message mon_probe(probe 4b5c8c0a-ff60-454b-a1b4-9747aa737d19 name standalone new mon_release 14) v7 with empty dest
debug 2019-04-09 07:47:27.826 7fb411007700 0 -- [v2:192.168.24.1:3300/0,v1:192.168.24.1:6789/0] send_to message mon_probe(probe 4b5c8c0a-ff60-454b-a1b4-9747aa737d19 name standalone new mon_release 14) v7 with empty dest

Running the same jobs but applying the hack on br-ex described here [1]:

https://review.openstack.org/#/c/651231/2/deployment/ceph-ansible/ceph-base.yaml

the cluster is able to perform the election and the deploy ends correctly.

Here an example of red ci [2] during a standalone execution

[2] RED CI:
https://logs.rdoproject.org/21/18721/29/check/rdoinfo-tripleo-stein-centos-7-scenario001-standalone/28c108e/logs/undercloud/home/zuul/undercloud-ansible-tjWQlx/ceph-ansible/ceph_ansible_command.log.txt.gz

description: updated
summary: - Ceph monitor doesn't bootstrap in a standalone deployment
+ [network] Ceph monitor doesn't bootstrap in a standalone deployment
Changed in tripleo:
status: New → Triaged
importance: Undecided → High
milestone: none → train-1
Changed in tripleo:
milestone: train-1 → train-2
Changed in tripleo:
milestone: train-2 → train-3
Revision history for this message
John Fulton (jfulton-org) wrote :

WORKAROUND

parameter_defaults:
  CephAnsibleExtraConfig:
    mon_host_v1: { 'enabled': False }

Changed in tripleo:
milestone: train-3 → ussuri-1
Changed in tripleo:
milestone: ussuri-1 → ussuri-2
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-2 → ussuri-3
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-3 → ussuri-rc3
wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Incomplete
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-rc3 → victoria-1
Changed in tripleo:
milestone: victoria-1 → victoria-3
Revision history for this message
John Fulton (jfulton-org) wrote :

This bug was only encountered on standalone deployments (1 server deploys with 1 mon). TripleO CI's standalone scenario is still using the workaround in Victoria [1]. What will be Wallaby is not using the workaround [3] as it's using cephadm [4]. The standalone deployment are used for CI jobs with constrained resources (and developer environments).

This bug was never encountered on multinode deployments with supported configurations like 3 monitors.

[1] https://github.com/openstack/tripleo-heat-templates/blob/stable/victoria/ci/environments/scenario001-standalone.yaml#L82

[2] https://github.com/openstack/tripleo-heat-templates/blob/stable/victoria/ci/environments/scenario004-standalone.yaml#L54

[3] https://github.com/openstack/tripleo-heat-templates/blob/master/ci/environments/scenario001-standalone.yaml

[4] https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/tripleo-ceph.html

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.