Repeatable deploy failure in dns-server for 2nd Secondary controller

Bug #1584839 reported by Bob Ball
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Oleksiy Molchanov

Bug Description

I've got a repeatable failure attempting to deploy to an environment with 3 controllers, 5 computes and 1 storage node. Setup and deployment is done through the UI only (not CLI) and this is an attempt to deploy a full environment (after a Reset). Primary controller deploys successfully, and one of two secondary controllers succeeds but the second controller consistently fails with the error:

[460] Task '{"priority"=>3700, "type"=>"puppet", "id"=>"dns-server", "parameters"=>{"puppet_modules"=>"/etc/puppet/modules", "puppet_manifest"=>"/etc/puppet/modules/osnailyfacter/modular/dns/dns-server.pp", "timeout"=>3600, "cwd"=>"/"}, "uids"=>["14"]}' failed on node 14

Logs on node 14 show an error relating to p_dns:

2016-05-23T14:53:53.183680+00:00 debug: Waiting 600 seconds for Pacemaker to become online
2016-05-23T14:53:53.183680+00:00 debug: Executing '/usr/sbin/crm_attribute -q --type crm_config --query --name dc-version'
2016-05-23T14:53:53.200234+00:00 debug: Executing '/usr/sbin/cibadmin -Q'
2016-05-23T14:53:53.250666+00:00 debug: Pacemaker is online
2016-05-23T14:53:53.282685+00:00 err: (/Stage[main]/Cluster::Dns_ocf/Service[p_dns]) Could not evaluate: Primitive 'p_dns' was not found in CIB!
2016-05-23T14:53:53.282951+00:00 err: (/Stage[main]/Cluster::Dns_ocf/Service[p_dns]) /usr/lib/ruby/vendor_ruby/puppet/util/errors.rb:106:in `fail'
2016-05-23T14:53:53.283859+00:00 err: (/Stage[main]/Cluster::Dns_ocf/Service[p_dns]) /etc/puppet/modules/pacemaker/lib/puppet/provider/service/pacemaker.rb:43:in `name'
2016-05-23T14:53:53.283859+00:00 err: (/Stage[main]/Cluster::Dns_ocf/Service[p_dns]) /etc/puppet/modules/pacemaker/lib/puppet/provider/service/pacemaker.rb:86:in `status'
2016-05-23T14:53:53.283859+00:00 err: (/Stage[main]/Cluster::Dns_ocf/Service[p_dns]) /usr/lib/ruby/vendor_ruby/puppet/type/service.rb:90:in `retrieve'
2016-05-23T14:53:53.283859+00:00 err: (/Stage[main]/Cluster::Dns_ocf/Service[p_dns]) /usr/lib/ruby/vendor_ruby/puppet/type.rb:1048:in `retrieve'

Snapshot available from https://citrix.sharefile.com/d-s8f5118bb80546758

Tags: area-library
Changed in fuel:
assignee: nobody → Oleksiy Molchanov (omolchanov)
milestone: none → 9.0
importance: Undecided → High
status: New → Confirmed
tags: added: area-library
Changed in fuel:
status: Confirmed → Invalid
milestone: 9.0 → 8.0-updates
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Seems that this issue is not related to deployment process, it is related to environment, can be network. Actually pacemaker on node-14 is not in cluster, so it failed. Can you check whether it can access primary controller via storage network?

Changed in fuel:
status: Invalid → Incomplete
Revision history for this message
Bob Ball (bob-ball) wrote :

The failure was following a successful deployment, and followed by another successful deployment after another Reset Environment.
No changes to the environment or network configuration were made between the original successful, subsequent failed, and finally successful deployments.
Network validation was run and reported successful validation; I presume this is sufficient to confirm that node-14 could access the primary controller on the storage network.

I'm not sure what you mean by node-14 not being in the cluster?

Revision history for this message
Ilya Kharin (akscram) wrote :

Info were provided. Moving to confirmed.

Changed in fuel:
status: Incomplete → Confirmed
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Would you be so kind to provide a new snapshot? Existing link is expired.

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Marking as Invalid because of no response for more than a month

Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.