After re-deployment of the controller in HA the line 'mon_initial_members' does not get updated in ceph.conf

Bug #1473076 reported by Oleksandr Liemieshko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Low
Mykola Golub
6.1.x
Won't Fix
Low
Mykola Golub

Bug Description

After re-deployment of the controller in HA the line 'mon_initial_members' does not get updated in ceph.conf

[root@fuel ~]# fuel --f
DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used as the source for settings. This behavior is deprecated. Please specify the path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS environment variable.
api: '1.0'
astute_sha: 1ea8017fe8889413706d543a5b9f557f5414beae
auth_required: true
build_id: 2015-06-19_13-02-31
build_number: '525'
feature_groups:
- mirantis
fuel-library_sha: 2e7a08ad9792c700ebf08ce87f4867df36aa9fab
fuel-ostf_sha: 8fefcf7c4649370f00847cc309c24f0b62de718d
fuelmain_sha: a3998372183468f56019c8ce21aa8bb81fee0c2f
nailgun_sha: dbd54158812033dd8cfd7e60c3f6650f18013a37
openstack_version: 2014.2.2-6.1
production: docker
python-fuelclient_sha: 4fc55db0265bbf39c369df398b9dc7d6469ba13b
release: '6.1'
release_versions:
  2014.2.2-6.1:
    VERSION:
      api: '1.0'
      astute_sha: 1ea8017fe8889413706d543a5b9f557f5414beae
      build_id: 2015-06-19_13-02-31
      build_number: '525'
      feature_groups:
      - mirantis
      fuel-library_sha: 2e7a08ad9792c700ebf08ce87f4867df36aa9fab
      fuel-ostf_sha: 8fefcf7c4649370f00847cc309c24f0b62de718d
      fuelmain_sha: a3998372183468f56019c8ce21aa8bb81fee0c2f
      nailgun_sha: dbd54158812033dd8cfd7e60c3f6650f18013a37
      openstack_version: 2014.2.2-6.1
      production: docker
      python-fuelclient_sha: 4fc55db0265bbf39c369df398b9dc7d6469ba13b
      release: '6.1'

Steps to reproduce:
1. HA (3 Controllers + 2 Computes+Ceph-OSD)
2. CentOS or Ubuntu
3. Neutron LAN
4. Ceph (use for all)
5. Deploy
6. Delete one of the controllers and add new

[root@fuel ~]# fuel nodes
DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used as the source for settings. This behavior is deprecated. Please specify the path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS environment variable.
id | status | name | cluster | ip | mac | roles | pending_roles | online | group_id
---|--------|------------------|---------|-----------|-------------------|-------------------|---------------|--------|---------
2 | ready | Untitled (49:dd) | 1 | 10.20.0.4 | 08:00:27:34:49:dd | controller | | True | 1
5 | ready | Untitled (a1:a4) | 1 | 10.20.0.6 | 08:00:27:03:a1:a4 | ceph-osd, compute | | True | 1
4 | ready | Untitled (6d:0f) | 1 | 10.20.0.7 | 08:00:27:79:6d:0f | ceph-osd, compute | | True | 1
1 | ready | Untitled (b7:e4) | 1 | 10.20.0.3 | 08:00:27:bc:b7:e4 | controller | | True | 1
6 | ready | Untitled (be:ef) | 1 | 10.20.0.8 | 08:00:27:57:be:ef | controller | | True | 1

We expect: on all controllers service 'ceph-mon' is running and on all nodes file 'ceph.conf' contains the new name of the controller in the line 'mon_initial_members'

But, we got:

[root@fuel ~]# for i in $(fuel nodes|grep ready|awk '{print $1}'); do ssh node-$i cat /etc/ceph/ceph.conf |grep mon_initial_members ;done
DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used as the source for settings. This behavior is deprecated. Please specify the path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS environment variable.

Warning: Permanently added 'node-2' (RSA) to the list of known hosts.
mon_initial_members = node-1 node-2 node-3

Warning: Permanently added 'node-5' (RSA) to the list of known hosts.
mon_initial_members = node-1 node-2 node-3

Warning: Permanently added 'node-4' (RSA) to the list of known hosts.
mon_initial_members = node-1 node-2 node-3

Warning: Permanently added 'node-1' (RSA) to the list of known hosts.
mon_initial_members = node-1 node-2 node-6

Warning: Permanently added 'node-6' (RSA) to the list of known hosts.
mon_initial_members = node-1 node-2 node-6

[root@node-1 ~]# ceph -s
2015-07-09 12:29:33.664512 7f6eb0846700 0 -- :/1007809 >> 192.168.0.5:6789/0 pipe(0x7f6eac022400 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f6eac022690).fault
    cluster 981016fb-dbe0-48fd-9bf8-e331f62157e0
     health HEALTH_WARN 1 mons down, quorum 0,1 node-1,node-2
     monmap e3: 3 mons at {node-1=192.168.0.3:6789/0,node-2=192.168.0.4:6789/0,node-3=192.168.0.5:6789/0}, election epoch 24, quorum 0,1 node-1,node-2
     osdmap e98: 4 osds: 4 up, 4 in
      pgmap v5022: 2496 pgs, 12 pools, 1815 MB data, 440 objects
            12087 MB used, 241 GB / 253 GB avail
                2496 active+clean

[root@node-1 ~]# ceph health detail
2015-07-09 12:30:30.658407 7eff441fb700 0 -- :/1010607 >> 192.168.0.5:6789/0 pipe(0x7eff40022370 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7eff40022600).fault
HEALTH_WARN 1 mons down, quorum 0,1 node-1,node-2
mon.node-3 (rank 2) addr 192.168.0.5:6789/0 is down (out of quorum)

[root@fuel ~]# for i in $(fuel nodes|grep controller|awk '{print $1}'); do ssh node-$i ps aux|grep ceph-mon ;done
DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used as the source for settings. This behavior is deprecated. Please specify the path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS environment variable.

Warning: Permanently added 'node-2' (RSA) to the list of known hosts.
root 2033 0.1 1.2 246564 49548 ? Sl 09:47 0:09 /usr/bin/ceph-mon -i node-2 --pid-file /var/run/ceph/mon.node-2.pid -c /etc/ceph/ceph.conf --cluster ceph

Warning: Permanently added 'node-1' (RSA) to the list of known hosts.
root 2037 0.1 1.1 240788 45896 ? Sl 09:47 0:09 /usr/bin/ceph-mon -i node-1 --pid-file /var/run/ceph/mon.node-1.pid -c /etc/ceph/ceph.conf --cluster ceph

Warning: Permanently added 'node-6' (RSA) to the list of known hosts.

Revision history for this message
Oleksandr Liemieshko (oliemieshko) wrote :
Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
Revision history for this message
Oleksandr Liemieshko (oliemieshko) wrote :
Changed in fuel:
milestone: none → 6.0.1-updates
milestone: 6.0.1-updates → 7.0
assignee: Fuel Library Team (fuel-library) → Oleksiy Molchanov (omolchanov)
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

After the investigation I don't see any sense in usage of mon_initial_members after the deployment.

https://ceph.com/docs/v0.79/dev/mon-bootstrap/

@Mykola, I am passing this to you to listen to your point of view, as you are Ceph SME .

Changed in fuel:
assignee: Oleksiy Molchanov (omolchanov) → Mykola Golub (mgolub)
Revision history for this message
Mykola Golub (mgolub) wrote :

The option mon_initial_members is used by a monitor only once, on its first start, before joining the cluster. On the next restarts it uses data stored in its local store database.

Thus, mon_initial_members has to be correct only at the moment of a new monitor deployment only on the host where it is deployed. From this point of view, there is no bug in current fuel behavior: in the provided example mon_initial_members was correct on the newly deployed node-6.

So, there is only a "cosmetic" bug, that might lead to confusion though. I think there is no need to address it in 6.X, while it might be fixed in 7.X.

Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

I am lowering this to Low.

Changed in fuel:
importance: High → Low
assignee: Mykola Golub (mgolub) → Fuel Library Team (fuel-library)
Revision history for this message
Oleksandr Liemieshko (oliemieshko) wrote :

What about service 'ceph-mon'
This service doesn't running on new controller

Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

I have tried the same scenario and it works

root@node-7:~# ps aux | grep ceph
root 9486 0.0 0.0 10432 624 pts/9 S+ 16:29 0:00 grep --color=auto ceph
root 31042 0.0 1.2 186732 38888 ? Ssl 11:21 0:05 /usr/bin/ceph-mon --cluster=ceph -i node-7 -f

Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

PS: node-7 is the latest controller

Revision history for this message
Mykola Golub (mgolub) wrote :

So, the problem is not in mon_initial_members. Here is the problem:

Thu Jul 09 11:02:07 +0000 2015 /Stage[main]/Ceph::Mon/Exec[ceph-deploy mon create] (info): Starting to evaluate the resource
Thu Jul 09 11:02:07 +0000 2015 Exec[ceph-deploy mon create](provider=posix) (debug): Executing check 'ceph mon stat | grep 192.168.0.5'
Thu Jul 09 11:02:07 +0000 2015 Puppet (debug): Executing 'ceph mon stat | grep 192.168.0.5'
Thu Jul 09 11:02:07 +0000 2015 /Stage[main]/Ceph::Mon/Exec[ceph-deploy mon create]/unless (debug): e3: 3 mons at {node-1=192.168.0.3:6789/0,node-2=192.168.0.4:6789/0,node-3=192.168.0.5:6789/0}, election epoch 24, q
uorum 0,1 node-1,node-2
Thu Jul 09 11:02:07 +0000 2015 /Stage[main]/Ceph::Mon/Exec[ceph-deploy mon create] (info): Evaluated in 0.25 seconds

So, 'ceph-deploy mon create' command was not executed when deploying node-6. This is because after removing node-3 and adding it as node-6, node-6 got IP address previously belonging to node-3 (192.168.0.5). The puppet script runs the following command to check if execution of 'ceph-deploy mon create' is not necessary:

   ceph mon stat | grep ${::internal_address}

The command returned success above, because monitor map had not been updated yet and contained 192.168.0.5 assigned to node-3.

A workaround: after removing a controller node wait for mon map to be updated before redeployoing the controller. Or restart monitors.

Revision history for this message
Mykola Golub (mgolub) wrote :

Actually, node-3=192.168.0.5:6789/0 was still in monmap because it look we don't call 'ceph mon remove ${nodeID}' on a controller node removal. So a workaraund is to run this command manually, before or after removing the controller.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/202982

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Mykola Golub (mgolub)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/202982
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=5ab761536c06f35952fb32a11c30aeeff765c892
Submitter: Jenkins
Branch: master

commit 5ab761536c06f35952fb32a11c30aeeff765c892
Author: Mykola Golub <email address hidden>
Date: Fri Jul 17 10:05:25 2015 +0300

    A more stricter check if monitor already deployed

    Previously it just looked for IP and gave a false positive result, when
    a controller node was removed, then added again and got the same IP.

    Change-Id: Ib6eb2e21919b8fd290d9574a1fd735947761c0ff
    Closes-Bug: #1473076

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Mykola Golub (mgolub) wrote :

I think there is no need in fixing this in 6.1, because it may happen in not usual case and the workaround exists. For this reason I also changed 'Importance' to 'Low'.

tags: added: on-verification
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

verified on
{

    "build_id": "301",
    "build_number": "301",
    "release_versions":

{

    "2015.1.0-7.0":

{

    "VERSION":

{

    "build_id": "301",
    "build_number": "301",
    "api": "1.0",
    "fuel-library_sha": "5d50055aeca1dd0dc53b43825dc4c8f7780be9dd",
    "nailgun_sha": "4162b0c15adb425b37608c787944d1983f543aa8",
    "feature_groups":

            [
                "mirantis"
            ],
            "fuel-nailgun-agent_sha": "d7027952870a35db8dc52f185bb1158cdd3d1ebd",
            "openstack_version": "2015.1.0-7.0",
            "fuel-agent_sha": "50e90af6e3d560e9085ff71d2950cfbcca91af67",
            "production": "docker",
            "python-fuelclient_sha": "486bde57cda1badb68f915f66c61b544108606f3",
            "astute_sha": "6c5b73f93e24cc781c809db9159927655ced5012",
            "fuel-ostf_sha": "2cd967dccd66cfc3a0abd6af9f31e5b4d150a11c",
            "release": "7.0",
            "fuelmain_sha": "a65d453215edb0284a2e4761be7a156bb5627677"
        }
    }

},
"auth_required": true,
"api": "1.0",
"fuel-library_sha": "5d50055aeca1dd0dc53b43825dc4c8f7780be9dd",
"nailgun_sha": "4162b0c15adb425b37608c787944d1983f543aa8",
"feature_groups":

    [
        "mirantis"
    ],
    "fuel-nailgun-agent_sha": "d7027952870a35db8dc52f185bb1158cdd3d1ebd",
    "openstack_version": "2015.1.0-7.0",
    "fuel-agent_sha": "50e90af6e3d560e9085ff71d2950cfbcca91af67",
    "production": "docker",
    "python-fuelclient_sha": "486bde57cda1badb68f915f66c61b544108606f3",
    "astute_sha": "6c5b73f93e24cc781c809db9159927655ced5012",
    "fuel-ostf_sha": "2cd967dccd66cfc3a0abd6af9f31e5b4d150a11c",
    "release": "7.0",
    "fuelmain_sha": "a65d453215edb0284a2e4761be7a156bb5627677"

}

Changed in fuel:
status: Fix Committed → Fix Released
tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.