changing ceph-{public,cluster}-network post deployment is unsupported

Bug #1384341 reported by Nobuto Murata on 2014-10-22
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack ceph charm
Low
Unassigned
OpenStack ceph-mon charm
Low
Unassigned
ceph (Juju Charms Collection)
Low
Unassigned
ceph-mon (Juju Charms Collection)
Low
Unassigned

Bug Description

lp:~openstack-charmers/charms/trusty/ceph/next
revno. 86

We used juju bundle at the bottom to setup multi-network ceph clusters, but ceph.conf uses both 10.3.X.X and 192.168.X.X address at the same time. However ceph-mon only listen on 10.3.X.X network. So `start ceph-mon-all` never finish.

tcp 0 0 10.3.0.103:6789 0.0.0.0:* LISTEN 11628/ceph-mon

ceph.conf
==========
[global]

auth cluster required = cephx
auth service required = cephx
auth client required = cephx

keyring = /etc/ceph/$cluster.$name.keyring
mon host = 10.3.0.103:6789 192.168.104.23:6789 192.168.104.24:6789
==========

ceph:
  series: trusty
  services:
    ceph:
      branch: lp:~openstack-charmers/charms/trusty/ceph/next
      constraints: tags=ceph
      num_units: 3
      options:
        fsid: '6547bd3e-1397-11e2-82e5-53567c8d32dc'
        monitor-secret: 'AQCXrnZQwI7KGBAAiPofmKEXKxu5bUzoYLVkbQ=='
        osd-devices: '/dev/vdb'
        osd-reformat: 'yes'
        ceph-cluster-network: '10.2.0.0/24'
        ceph-public-network: '10.3.0.0/24'

Nobuto Murata (nobuto) wrote :
tags: added: cts
Nobuto Murata (nobuto) wrote :

looks like this is related to bug #1384333. I could workaround it with attached branch. So this might be a duplicate of bug #1384333.

James Page (james-page) wrote :

This looks like a race when using the network support in the charm:

mon host = 10.3.0.103:6789 192.168.104.23:6789 192.168.104.24:6789

10.3.0.103 will only be expecting traffic on 192.168.104.x (or whatever is configured for the 'public' network).

We probably need to switch to not using private-address as the indicator/keys for bootstrap.

Changed in ceph (Juju Charms Collection):
importance: Undecided → Medium
status: New → Triaged
milestone: none → 15.04
James Page (james-page) on 2015-04-23
tags: added: openstack
Changed in ceph (Juju Charms Collection):
milestone: 15.04 → 15.07
James Page (james-page) on 2015-08-10
Changed in ceph (Juju Charms Collection):
milestone: 15.07 → 15.10
Florian Haas (fghaas) wrote :

Using the private address for the mon host looks rather nonsensical when the ceph charm is deployed in a MAAS environment, rendering the cluster network and public network options useless. What happens is that the primary interface address will typically be on the MAAS network, which is *not* the public network, and then what the Mon actually listens on is the MAAS network, not the network given in ceph-public-network. In other words, any Ceph client that is on the public network, but not on the MAAS network, won't be able to use the cluster at all.

This looks like a show stopper bug. Can this be given a higher priority please?

Florian Haas (fghaas) wrote :

Looking into this a little more closely, it appears that the charm does attempt to do the right thing in http://bazaar.launchpad.net/~openstack-charmers/charms/trusty/ceph/trunk/view/head:/hooks/utils.py#L74. However, this only fires on mon-relation-changed, so the only way to get this updated is to remove a unit and then redeploy it.

Even so, the charm makes no attempt to update the monmap, so while peers are instructed to find the mons at the new address, the mon doesn't actually start to *listen* on that address, as it will select its listening IP based not on ceph.conf, but on the monmap.

See also: http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address

That means that if you change ceph-public-network in an existing deployment, and you redeploy all units, what you'll end up with is a "mon hosts" entry in all your hosts' ceph.conf that does not contain a single IP that a mon is actually listening on.

James Page (james-page) wrote :

I dug into this problem a bit more this morning; in the scenario where ceph-public/cluster-address is provided at deployment time (i.e. not changed after deployment), the monitors should all boot from the address they have on the public network; this assumes that the machine the monitor is running on has a IP on the public network - if not it falls back to the private-address (as detail in get_public_addr). If one of the machines does not have an IP on the public network, it will pass 'private-address' to its peers, resulting in a mixed address deployment as described in the original bug report. We should probably switch to a hard configuration choice - if ceph-public-network is provided then no fallback should be used in the event of a mis-configured machine network configuration.

Changing the values of ceph-public-address and ceph-cluster-address post bootstrap of the cluster is current not supported by the charm in any way - i.e. it will make no attempt to reconfigure the daemons running todo the magic trick of re-IPing a running storage cluster. Unfortunately this is not made clear in either the README or the config.yaml, and juju does not have the concept of immutable configuration...

James Page (james-page) on 2015-10-22
Changed in ceph (Juju Charms Collection):
milestone: 15.10 → 16.01
James Page (james-page) on 2016-01-28
Changed in ceph (Juju Charms Collection):
milestone: 16.01 → 16.04
James Page (james-page) on 2016-04-22
Changed in ceph (Juju Charms Collection):
milestone: 16.04 → 16.07
Changed in ceph-mon (Juju Charms Collection):
status: New → Triaged
importance: Undecided → Medium
milestone: none → 16.07
Liam Young (gnuoy) on 2016-07-29
Changed in ceph (Juju Charms Collection):
milestone: 16.07 → 16.10
Changed in ceph-mon (Juju Charms Collection):
milestone: 16.07 → 16.10
James Page (james-page) on 2016-10-14
Changed in ceph (Juju Charms Collection):
milestone: 16.10 → 17.01
Changed in ceph-mon (Juju Charms Collection):
milestone: 16.10 → 17.01
James Page (james-page) on 2016-12-13
Changed in ceph-mon (Juju Charms Collection):
milestone: 17.01 → none
Changed in ceph (Juju Charms Collection):
milestone: 17.01 → none
importance: Medium → Low
Changed in ceph-mon (Juju Charms Collection):
importance: Medium → Low
summary: - when enable ceph-public-network and ceph-cluster-network, ceph-mon
- cannot communicate each other
+ changing ceph-{public,cluster}-network post deployment is unsupported
James Page (james-page) on 2017-02-23
Changed in charm-ceph:
importance: Undecided → Low
status: New → Triaged
Changed in ceph (Juju Charms Collection):
status: Triaged → Invalid
Changed in charm-ceph-mon:
importance: Undecided → Low
status: New → Triaged
Changed in ceph-mon (Juju Charms Collection):
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments