Ceph-deploy config pull failed on mongodb role

Bug #1316524 reported by Bogdan Dobrelya
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Kirill Omelchenko
5.1.x
Fix Committed
Medium
Dmitry Borodaenko
6.0.x
Invalid
High
Kirill Omelchenko

Bug Description

might be related by the error type https://bugs.launchpad.net/fuel/+bug/1266853, https://bugs.launchpad.net/fuel/+bug/1253115

{"build_id": "2014-05-05_00-15-43", "mirantis": "yes", "build_number": "180", "ostf_sha": "134765fcb5a07dce0cd1bb399b2290c988c3c63b", "nailgun_sha": "2de1dcf9fa3fc1521999bff6377eaa6f01d825aa", "production": "docker", "api": "1.0", "fuelmain_sha": "95c35c199c2efc03fb105d090c5a42525430b7b3", "astute_sha": "3cffebde1e5452f5dbf8f744c6525fc36c7afbf3", "release": "5.0", "fuellib_sha": "2348fae80b21c3ec9e5f520395eea2943a510f3d"}

Ubuntu HA: Sahara, Murano, Ceilometer, Ceph volumes & images, Neutron VLAN with public untagged (verification OK)
nodes and roles:
* controller + ceph osd
* 2 controllers + mongodb
* compute + ceph osd

Issue:
Deploy error while puppet run at node-2: ceph-deploy --overwrite-conf config pull node-1 returned 1 instead of one of [0]
(Manual run later was OK, though)

Logs from node-2:
<27>May 5 15:45:50 node-2 puppet-user[1090]: (/Stage[main]/Ceph::Conf/Exec[ceph-deploy config pull]/returns) change from notrun to 0 failed: ceph-deploy --overwrite-conf config pull node-1 returned 1 instead of one of [0]
<29>May 5 15:45:50 node-2 puppet-user[1090]: (/Stage[main]/Ceph::Conf/Exec[ceph-deploy gatherkeys remote]) Dependency Exec[ceph-deploy config pull] has failures: true
<28>May 5 15:45:50 node-2 puppet-user[1090]: (/Stage[main]/Ceph::Conf/Exec[ceph-deploy gatherkeys remote]) Skipping because of failed dependencies

Manual run later:
root@node-2:~# ceph-deploy --overwrite-conf config pull node-1
[ceph_deploy.cli][INFO ] Invoked (1.2.7): /usr/bin/ceph-deploy --overwrite-conf config pull node-1
[ceph_deploy.config][DEBUG ] Checking node-1 for /etc/ceph/ceph.conf
[ceph_deploy.sudo_pushy][DEBUG ] will use a remote connection without sudo
[ceph_deploy.config][DEBUG ] Got /etc/ceph/ceph.conf from node-1
root@node-2:~# date
Tue May 6 10:00:30 UTC 2014

Status:
root@node-2:~# ceph osd stat
e24: 4 osds: 4 up, 4 in
root@node-2:~# ceph mon stat
e3: 3 mons at {node-1=192.168.0.2:6789/0,node-2=192.168.0.3:6789/0,node-3=192.168.0.4:6789/0}, election epoch 6, quorum 0,1,2 node-1,node-2,node-3
root@node-2:~# ceph -w
  cluster 83f97c62-8065-4955-9895-61432d586f34
   health HEALTH_WARN clock skew detected on mon.node-2, mon.node-3
   monmap e3: 3 mons at {node-1=192.168.0.2:6789/0,node-2=192.168.0.3:6789/0,node-3=192.168.0.4:6789/0}, election epoch 6, quorum 0,1,2 node-1,node-2,node-3
   osdmap e24: 4 osds: 4 up, 4 in
    pgmap v46: 960 pgs: 960 active+clean; 14464 KB data, 8368 MB used, 189 GB / 197 GB avail
   mdsmap e1: 0/0/1 up

2014-05-06 10:04:43.257902 mon.0 [WRN] mon.1 192.168.0.3:6789/0 clock skew 0.328867s > max 0.05s

Tags: ceph
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Changed in fuel:
status: New → Confirmed
Changed in fuel:
milestone: 5.0 → 5.1
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

node-1 puppet-apply.log starts at 15:47:29, ceph-deploy config pull failed on node-2 at 15:45:50, 15:46:42, and 15:47:20. Ceph manifests on node-2 ran and tried to pull config from the primary controller before primary controller node was applied.

Revision history for this message
Andrew Woodward (xarses) wrote :

This is because class ceph is called regardless of role in osnalyfacter and it will perform config pull regardless of primary controller running first.

we either need to call ceph per role in osnallyfacter or otherwise preventing config pull from running in ceph if the current role isn't what we expect.

Changed in fuel:
status: Confirmed → Triaged
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Since mongodb role is applied before primary controller, ceph manifests should be refactored check the role before pulling the config, not after.

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Dmitry Borodaenko (dborodaenko)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/92415
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=a375ba2c3665ef7b30b10d2d3174dae77b69b136
Submitter: Jenkins
Branch: master

commit a375ba2c3665ef7b30b10d2d3174dae77b69b136
Author: Dmitry Borodaenko <email address hidden>
Date: Tue May 6 10:22:33 2014 -0700

    Do not pull ceph.conf for roles unrelated to Ceph

    If ceph::conf invokes 'ceph-deploy config pull' before primary
    controller is done (e.g. as part of mongodb role), it will fail and
    disrupt deployment.

    This commit also partially cleans up the mess introduced in
    f50248cea09b320d276cf72321ed29b411fd4b55: only ceph package has a
    service that can be restarted, setting up notifications for anything
    else is meaningless.

    Change-Id: I887bd38209752e634c11d45e4d69e0d24972d5f6
    Closes-bug: #1316524

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Kirill Omelchenko (komelchenko) wrote : Re: Ceph-deploy config pull issue

Reproduced on 6.0TP RC3:

{

   "build_id": "2014-10-30_14-51-06",

   "ostf_sha": "f47fd1d66a7255213ee075d5c11b8f111f922000",

   "build_number": "64",

   "auth_required": true,

   "api": "1.0",

   "nailgun_sha": "24e2b931d021dbdad21dcc69a77809f78c025004",

   "production": "docker",

   "fuelmain_sha": "79c58417059e6cd919d5342f2625ef7ba3a09bbc",

   "astute_sha": "97eea90efe0a1f17b4934919d6e459d270c10372",

   "feature_groups": [

      "mirantis",

      "techpreview"

   ],

   "release": "6.0-techpreview",

   "release_versions": {

      "2014.2-6.0-techpreview": {

         "VERSION": {

            "build_id": "2014-10-30_14-51-06",

            "ostf_sha": "f47fd1d66a7255213ee075d5c11b8f111f922000",

            "build_number": "64",

            "api": "1.0",

            "nailgun_sha": "24e2b931d021dbdad21dcc69a77809f78c025004",

            "production": "docker",

            "fuelmain_sha": "79c58417059e6cd919d5342f2625ef7ba3a09bbc",

            "astute_sha": "97eea90efe0a1f17b4934919d6e459d270c10372",

            "feature_groups": [

               "mirantis",

               "techpreview"

            ],

            "release": "6.0-techpreview",

            "fuellib_sha": "f43d885914d74fbd062096763222f350f47480e1"

         }

      }

   },

   "fuellib_sha": "f43d885914d74fbd062096763222f350f47480e1"

}

Scenario:
1. Create next env:
HA, Centos, Neutron VLAN with, Ceph for volumes: 3x controllers, 1x compute, 2x ceph
with next disks configurations
- compute: http://i.imgur.com/1f8TY3S.png
- ceph-1: http://i.imgur.com/NsG9AMF.png
- ceph-2: http://i.imgur.com/upbJAen.png
2. Deploy cluster

Expected: The Cluster deploys successfully.

Actual: Deploy succeeds on every node but the compute, which errors with next messages in puppet log http://paste.openstack.org/show/127485/

Changed in fuel:
milestone: 5.1 → 6.0
status: Fix Committed → Confirmed
milestone: 6.0 → 5.0.3
milestone: 5.0.3 → 5.1.2
Changed in fuel:
status: Confirmed → Fix Committed
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

This is not the same bug, original was happening on mongodb role, not on compute.

Please do not resurrect old unrelated bugs without proper investigation. It's better to raise a new bug and later find out that it is a duplicate than to pullute old closed bugs with irrelevant comments that confuse the hell out of users looking for a solution for their problem.

Please do not mess with target milestone settings, it's impossible now to set target milestone for this bug back to 5.1 to indicate that that is the Fuel release where this bug was fixed.

Please avoid using paste.openstack.org, it's down more often than it's up.

Please provide all details (most importantly diagnostic snapshot) as described here:
https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Test_and_report_bugs

summary: - Ceph-deploy config pull issue
+ Ceph-deploy config pull failed on mongodb role
Revision history for this message
Nastya Urlapova (aurlapova) wrote :

Kirill, please file other issue about your failed environment, also will be nice if you attach to it diagnostic snapshot.

Revision history for this message
Kirill Omelchenko (komelchenko) wrote :

Sorry for confusing you all. Thanks for advice Dmitry, Nastya.
Here's a new bug with d.snapshot: https://bugs.launchpad.net/fuel/+bug/1388749

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.