Can't stop Ceph OSD after fresh install

Bug #1374160 reported by Claude Durocher
42
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Oleksiy Molchanov
5.1.x
Fix Committed
High
Oleksiy Molchanov
6.0.x
Fix Committed
High
Oleksiy Molchanov
6.1.x
Fix Released
High
Oleksiy Molchanov

Bug Description

Environment:

{"build_id": "2014-09-17_21-40-34", "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346", "build_number": "11", "auth_required": true, "api": "1.0", "nailgun_sha": "eb8f2b358ea4bb7eb0b2a0075e7ad3d3a905db0d", "production": "docker", "fuelmain_sha": "8ef433e939425eabd1034c0b70e90bdf888b69fd", "astute_sha": "f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13", "feature_groups": ["mirantis", "experimental"], "release": "5.1", "release_versions": {"2014.1.1-5.1": {"VERSION": {"build_id": "2014-09-17_21-40-34", "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346", "build_number": "11", "api": "1.0", "nailgun_sha": "eb8f2b358ea4bb7eb0b2a0075e7ad3d3a905db0d", "production": "docker", "fuelmain_sha": "8ef433e939425eabd1034c0b70e90bdf888b69fd", "astute_sha": "f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13", "feature_groups": ["mirantis"], "release": "5.1", "fuellib_sha": "d9b16846e54f76c8ebe7764d2b5b8231d6b25079"}}}, "fuellib_sha": "d9b16846e54f76c8ebe7764d2b5b8231d6b25079"}

Steps to reproduce:

-Deploy 3 controller nodes, 3 compute nodes and 3 ceph nodes used for glance and cinder on Ubuntu 12.04
-ssh to ceph node
-display running osd on node (ceph osd tree)
-issue stop ceph-osd id=XX
-receive "error stop: Unknown instance: ceph/XX"

example:

root@node-25:~# ceph osd tree
# id weight type name up/down reweight
-1 16.35 root default
-2 5.45 host node-24
0 1.09 osd.0 up 1
3 1.09 osd.3 up 1
6 1.09 osd.6 up 1
9 1.09 osd.9 up 1
12 1.09 osd.12 up 1
-3 5.45 host node-26
1 1.09 osd.1 up 1
4 1.09 osd.4 up 1
7 1.09 osd.7 up 1
10 1.09 osd.10 up 1
13 1.09 osd.13 up 1
-4 5.45 host node-25
2 1.09 osd.2 up 1
5 1.09 osd.5 up 1
8 1.09 osd.8 up 1
11 1.09 osd.11 up 1
14 1.09 osd.14 up 1
root@node-25:~# stop ceph-osd id=2
stop: Unknown instance: ceph/2
root@node-25:~#

Expected result:

The osd should be downed by the command

Work around :

reboot ceph node : stop ceph-osd command works fine

Changed in fuel:
assignee: nobody → Fuel for Openstack (fuel)
Changed in fuel:
status: New → Incomplete
importance: Undecided → Low
milestone: none → 6.0
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

I can't reproduce is, similar environment, same ISO version (build 11 aka 5.1 GA):

root@node-1:~# ceph osd tree
# id weight type name up/down reweight
-1 0.18 root default
-2 0.06 host node-2
0 0.03 osd.0 up 1
3 0.03 osd.3 up 1
-3 0.06 host node-3
2 0.03 osd.2 up 1
5 0.03 osd.5 up 1
-4 0.06 host node-1
1 0.03 osd.1 up 1
4 0.03 osd.4 up 1
root@node-1:~# stop ceph-osd id=1
ceph-osd stop/waiting
root@node-1:~# ceph osd tree
# id weight type name up/down reweight
-1 0.18 root default
-2 0.06 host node-2
0 0.03 osd.0 up 1
3 0.03 osd.3 up 1
-3 0.06 host node-3
2 0.03 osd.2 up 1
5 0.03 osd.5 up 1
-4 0.06 host node-1
1 0.03 osd.1 down 1
4 0.03 osd.4 up 1
root@node-1:~# start ceph-osd id=1
ceph-osd (ceph/1) start/running, process 1485
root@node-1:~# ceph osd tree
# id weight type name up/down reweight
-1 0.18 root default
-2 0.06 host node-2
0 0.03 osd.0 up 1
3 0.03 osd.3 up 1
-3 0.06 host node-3
2 0.03 osd.2 up 1
5 0.03 osd.5 up 1
-4 0.06 host node-1
1 0.03 osd.1 up 1
4 0.03 osd.4 up 1

Revision history for this message
Miroslav Anashkin (manashkin) wrote :

Reproduced.

It is possible to manage only one OSD per node under Ubuntu.

Dima, if you tried to stop osd.4 instead of osd.1 on the same node, you would get this error.
Only single OSD is registered in Ubuntu upstart. CentOS has no such issue.

root@node-3:~# ceph osd tree
# id weight type name up/down reweight
-1 0.24 root default
-2 0.12 host node-3
0 0.06 osd.0 up 1
1 0.06 osd.1 up 1
-3 0.12 host node-1
2 0.06 osd.2 up 1
3 0.06 osd.3 up 1

root@node-3:~# ps -ef | grep ceph
root 18414 18295 0 19:15 pts/3 00:00:00 grep --color=auto ceph
root 19989 1 0 Oct14 ? 00:03:14 /usr/bin/ceph-osd --cluster=ceph -i 0 -f
root 20830 1 0 Oct14 ? 00:02:51 /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph

root@node-3:~# initctl list | grep ceph
ceph-osd-all start/running
ceph-mds-all start/running
ceph-mds-all-starter stop/waiting
ceph-osd-all-starter stop/waiting
ceph-mon-all start/running
ceph-all start/running
ceph-mon-all-starter stop/waiting
ceph-mon stop/waiting
ceph-create-keys stop/waiting
ceph-osd (ceph/0) start/running, process 19989
ceph-mds stop/waiting

root@node-3:~# stop ceph-osd id=1
stop: Unknown instance: ceph/1

Changed in fuel:
status: Incomplete → Confirmed
importance: Low → High
tags: added: customer-found library
tags: added: ceph
Revision history for this message
Miroslav Anashkin (manashkin) wrote :

BTW, workaround with node reboot works..
" --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph" tail disappeared from the first Ceph process but both Ceph OSD appeared in upstart.
After reboot:

root@node-3:~# ps -ef | grep ceph
root 2824 1 4 20:03 ? 00:00:02 /usr/bin/ceph-osd --cluster=ceph -i 1 -f
root 2949 1 3 20:03 ? 00:00:02 /usr/bin/ceph-osd --cluster=ceph -i 0 -f
root 4579 4447 0 20:04 pts/0 00:00:00 grep --color=auto ceph

root@node-3:~# initctl list | grep ceph
ceph-mds-all-starter stop/waiting
ceph-mds-all start/running
ceph-osd-all start/running
ceph-osd-all-starter stop/waiting
ceph-all start/running
ceph-mon-all start/running
ceph-mon-all-starter stop/waiting
ceph-mon stop/waiting
ceph-create-keys stop/waiting
ceph-osd (ceph/0) start/running, process 2949
ceph-osd (ceph/1) start/running, process 2824
ceph-mds stop/waiting

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Triaged due to w/a explained

Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

I run few tests and want to assume.

You cannot run stop ceph-osd id=XX if `ceph-osd ` was started using `ceph-deploy osd activate`. To override this I recommend to use in our manifests service { `ceph-osd-all`: ensure => running } instead of exec ['ceph-deploy osd activate']. Due to http://ceph.com/docs/master/rados/deployment/ceph-deploy-osd/ it does the same.

Tested on my env. After patching master node the env was successfully deployed, ceph was working and test vm in horizon worked too.

Does my approach make sense?

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Yes, this makes sense, lets try a patch that does that.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/135337

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/5.1)

Fix proposed to branch: stable/5.1
Review: https://review.openstack.org/135338

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

If this patch could fix https://bugs.launchpad.net/bugs/1322230 as well, we should rise it to high and address in 5.1.1/6.0

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/135337
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=cd8409fda9cfacd6cc3500456555cd021d25c37c
Submitter: Jenkins
Branch: master

commit cd8409fda9cfacd6cc3500456555cd021d25c37c
Author: Oleksiy Molchanov <email address hidden>
Date: Tue Nov 18 18:08:21 2014 +0200

    Remove 'ceph-deploy osd activate'

    Remove Exec['ceph-deploy osd activate'] because the same
    is done by Service['ceph'], that is more preferable.

    Change-Id: I47cae0a4f937be565238818b597811f936ca493a
    Closes-Bug: 1374160

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/5.1)

Reviewed: https://review.openstack.org/135338
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=5a4254b711fe30d26680948c30225953f96521cb
Submitter: Jenkins
Branch: stable/5.1

commit 5a4254b711fe30d26680948c30225953f96521cb
Author: Oleksiy Molchanov <email address hidden>
Date: Tue Nov 18 18:08:21 2014 +0200

    Remove 'ceph-deploy osd activate'

    Remove Exec['ceph-deploy osd activate'] because the same
    is done by Service['ceph'], that is more preferable.

    Change-Id: I47cae0a4f937be565238818b597811f936ca493a
    Closes-Bug: 1374160

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/6.0)

Fix proposed to branch: stable/6.0
Review: https://review.openstack.org/142784

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/6.0)

Reviewed: https://review.openstack.org/142784
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=fb5837947c25ab76d03b8bb7f666e59719fa0165
Submitter: Jenkins
Branch: stable/6.0

commit fb5837947c25ab76d03b8bb7f666e59719fa0165
Author: Oleksiy Molchanov <email address hidden>
Date: Tue Nov 18 18:08:21 2014 +0200

    Remove 'ceph-deploy osd activate'

    Remove Exec['ceph-deploy osd activate'] because the same
    is done by Service['ceph'], that is more preferable.

    Change-Id: I47cae0a4f937be565238818b597811f936ca493a
    Closes-Bug: 1374160

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (stable/6.0)

Related fix proposed to branch: stable/6.0
Review: https://review.openstack.org/144144

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (stable/6.0)

Reviewed: https://review.openstack.org/144144
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=2d097f9cdb5a89ef4aba34958201cdc68756ae4f
Submitter: Jenkins
Branch: stable/6.0

commit 2d097f9cdb5a89ef4aba34958201cdc68756ae4f
Author: Vladimir Kuklin <email address hidden>
Date: Fri Dec 26 14:20:19 2014 +0000

    Revert "Remove 'ceph-deploy osd activate'"

    This commit should not have been merged into stable/6.0 branch before 6.0 release.

    Related-bug: #1374160

    This reverts commit fb5837947c25ab76d03b8bb7f666e59719fa0165.

    Change-Id: Ia60cc52320dcbc48fbe1b25d7b867f20f20b4347

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/6.0)

Fix proposed to branch: stable/6.0
Review: https://review.openstack.org/144351

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/6.0)

Reviewed: https://review.openstack.org/144351
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=d603e55e75888913e278c9a07a1a00edba7d7e05
Submitter: Jenkins
Branch: stable/6.0

commit d603e55e75888913e278c9a07a1a00edba7d7e05
Author: Oleksiy Molchanov <email address hidden>
Date: Tue Nov 18 18:08:21 2014 +0200

    Remove 'ceph-deploy osd activate'

    Remove Exec['ceph-deploy osd activate'] because the same
    is done by Service['ceph'], that is more preferable.

    Originally submitted as I47cae0a4f937be565238818b597811f936ca493a,
    had to be temporarily reverted to build 6.0 RC4.

    Change-Id: I3afe1448cdeb9d7c3add97e5bb4e851b4d3b1fa5
    Closes-Bug: 1374160

Revision history for this message
Anastasia Palkina (apalkina) wrote :

Reproduced on ISO #63

"build_id": "2015-01-15_22-54-45", "ostf_sha": "92ad9f8e4c509c82e07ceb093b5d579205c76014", "build_number": "63", "auth_required": true, "api": "1.0", "nailgun_sha": "051d23b22c21eab39c968372a5d40727c2b66281", "production": "docker", "fuelmain_sha": "", "astute_sha": "82125b0eef4e5a758fd4afa8917812e09a1f7dac", "feature_groups": ["mirantis"], "release": "6.1", "release_versions": {"2014.2-6.1": {"VERSION": {"build_id": "2015-01-15_22-54-45", "ostf_sha": "92ad9f8e4c509c82e07ceb093b5d579205c76014", "build_number": "63", "api": "1.0", "nailgun_sha": "051d23b22c21eab39c968372a5d40727c2b66281", "production": "docker", "fuelmain_sha": "", "astute_sha": "82125b0eef4e5a758fd4afa8917812e09a1f7dac", "feature_groups": ["mirantis"], "release": "6.1", "fuellib_sha": "59af43598682f4f0c5aebf584a959ac730a4d86d"}}}, "fuellib_sha": "59af43598682f4f0c5aebf584a959ac730a4d86d"

First deployment:
1. Create new environment (CentOS)
2. Choose nova-network, vlan manager
3. Choose Ceph for images
4. Choose Sahara and Ceilometer
5. Add 1 controller+ceph, 1 compute+ceph, 1cinder+ceph, 2mongo
6. Start deployment. It has failed with error on controller (node-1)

2015-01-16 12:32:36 ERR

 (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) change from notrun to 0 failed: ceph-deploy osd prepare node-1:/dev/sdb4 node-1:/dev/sdc4 returned 1 instead of one of [0]

Second deployment:
1. Create new environment (CentOS)
2. Choose neutron GRE
3. Choose Ceph for images
4. Choose Murano and Ceilometer
5. Add 1 controller+mongo, 1 compute+ceph+cinder, 1 cinder+mongo, 1 ceph
6. Start deployment. It has failed with error on compute (node-7):

2015-01-16 12:42:22 ERR

 (/Stage[main]/Ceph::Conf/Exec[ceph-deploy config pull]/returns) change from notrun to 0 failed: ceph-deploy --overwrite-conf config pull node-6 returned 1 instead of one of [0]

Revision history for this message
Anastasia Palkina (apalkina) wrote :
Maksym Strukov (unbelll)
tags: added: on-verification
Revision history for this message
Maksym Strukov (unbelll) wrote :

Can't reproduce with original steps in 6.1-525
Also can't reproduce with Anastasia's both scenarios in 6.1-525.

{"build_id": "2015-06-19_13-02-31", "build_number": "525", "release_versions": {"2014.2.2-6.1": {"VERSION": {"build_id": "2015-06-19_13-02-31", "build_number": "525", "api": "1.0", "fuel-library_sha": "2e7a08ad9792c700ebf08ce87f4867df36aa9fab", "nailgun_sha": "dbd54158812033dd8cfd7e60c3f6650f18013a37", "feature_groups": ["mirantis"], "openstack_version": "2014.2.2-6.1", "production": "docker", "python-fuelclient_sha": "4fc55db0265bbf39c369df398b9dc7d6469ba13b", "astute_sha": "1ea8017fe8889413706d543a5b9f557f5414beae", "fuel-ostf_sha": "8fefcf7c4649370f00847cc309c24f0b62de718d", "release": "6.1", "fuelmain_sha": "a3998372183468f56019c8ce21aa8bb81fee0c2f"}}}, "auth_required": true, "api": "1.0", "fuel-library_sha": "2e7a08ad9792c700ebf08ce87f4867df36aa9fab", "nailgun_sha": "dbd54158812033dd8cfd7e60c3f6650f18013a37", "feature_groups": ["mirantis"], "openstack_version": "2014.2.2-6.1", "production": "docker", "python-fuelclient_sha": "4fc55db0265bbf39c369df398b9dc7d6469ba13b", "astute_sha": "1ea8017fe8889413706d543a5b9f557f5414beae", "fuel-ostf_sha": "8fefcf7c4649370f00847cc309c24f0b62de718d", "release": "6.1", "fuelmain_sha": "a3998372183468f56019c8ce21aa8bb81fee0c2f"}

tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.