Deployment of cluster with ceph is failed with error: "change from notrun to 0 failed: ceph-deploy --overwrite-conf config pull node-8 returned 1 instead of one of [0]"

Bug #1664961 reported by TatyanaGladysheva
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Critical
Alexey Stupnikov

Bug Description

Detailed bug description:
 Deployment of clusters with ceph nodes is failed.

Steps to reproduce:
1. Create the following clusters:
cluster 1:
  1 controller node + 3 ceph-osd nodes
Storage Backends:
  Ceph RBD for volumes (Cinder)
  Ceph RadosGW for objects (Swift API)
  Ceph RBD for ephemeral volumes (Nova)
  Ceph RBD for images (Glance)

cluster 2:
  1 controller node + 2 compute+cinder+ceph-osd nodes
Storage Backends:
  Cinder LVM over iSCSI for volumes
  Ceph RadosGW for objects (Swift API)
  Ceph RBD for ephemeral volumes (Nova)

2. Deploy clusters

Actual results:
Deployment is failed for every cluster. Errors are similar for both clusters:
From Astute log on master node:
2017-02-15 09:30:01 ERROR [391] Task '{"priority"=>1200, "type"=>"puppet", "id"=>"top-role-ceph-osd", "parameters"=>{"puppet_modules"=>"/etc/puppet/modules", "puppet_manifest"=>"/etc/puppet/modules/osnailyfacter/modular/ceph/ceph-osd.pp", "timeout"=>3600, "cwd"=>"/"}, "uids"=>["9"]}' failed on node 9

From puppet log on error node with ceph:
2017-02-15 09:30:01 ERR (/Stage[main]/Ceph::Conf/Exec[ceph-deploy config pull]/returns) change from notrun to 0 failed: ceph-deploy --overwrite-conf config pull node-8 returned 1 instead of one of [0]

Expected results:
Deployment should be finished successfully.

Reproducibility:
Always

VERSION:
MOS 8.0 + MU4 updates

Additional information:
Issue is not reproducible on clear 8.0 without MU4 updates.

Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :
Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :
Changed in fuel:
milestone: none → 8.0-mu-4
importance: Undecided → Critical
assignee: nobody → MOS Maintenance (mos-maintenance)
Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :

Note:
Not all clusters with ceph are affected with this bug.
For example, the following cluster with Ceph is deployed successfully on 8.0 + MU4:
1 controller + ceph-osd
1 compute + ceph-osd
Ceph object replication factor 2

Storage Backends:
  Ceph RBD for volumes (Cinder)
  Ceph RadosGW for objects (Swift API)
  Ceph RBD for ephemeral volumes (Nova)
  Ceph RBD for images (Glance)

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Possible duplicate of bug #1625197

Changed in fuel:
assignee: MOS Maintenance (mos-maintenance) → Alexey Stupnikov (astupnikov)
Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Please ignore the previous comment. Possible duplicate of bug #1604879.

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

I have tried to reproduce this bug using the steps-to-reproduce from description and it turns out that if I configure networks correctly (2 clusters have their own networks), I will not be able to catch this error. I think that there could be 3 possible reasons this issue arises:

1. You didn't select correct Ceph object replication factor for cluster #2.
2. You didn't assign separate Public/Storage/MGMT networks to clusters (you used the same networks for both of them).
3. You reproduced an issue that will be fixed in bug #1604879 (you have used swift and rados gw simultaneously).

As a result, I think that this bug should be closed as Invalid, please feel free to re-open it if I am wrong.

Changed in fuel:
status: New → Invalid
milestone: 8.0-mu-4 → 8.0-updates
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.