Error in generating ceph key prevents deployment of new environment

Bug #1643150 reported by Ken
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Critical
Stanislaw Bogatkin

Bug Description

Detailed bug description:
Hello,
I am using fuel-community-9.0 and have been unable to create a working environment. Each time I deploy a new environment, I get an error when generating ceph master keys.

I looked at the /var/log/astute/astute.log file and think that for some reason astute is not registering when puppet generates the ceph keys; I can see that the ceph keys has been generated in /var/lib/fuel/keys/<env uid>/ceph, however, it is not in /var/lib/fuel/keys/master/.

Here are the logs:
2016-11-19 01:32:00 DEBUG [1146] Node master(generate_keys_ceph) status: running
2016-11-19 01:32:01 DEBUG [1146] {"nodes"=>[{"status"=>"deploying", "uid"=>"master", "role"=>"generate_keys_ceph"}]}
2016-11-19 01:32:01 DEBUG [1146] Cluster[]: Process node: Node[91]
2016-11-19 01:32:01 DEBUG [1146] Cluster[]: Process node: Node[90]
2016-11-19 01:32:01 DEBUG [1146] Cluster[]: Process node: Node[93]
2016-11-19 01:32:01 DEBUG [1146] Cluster[]: Process node: Node[92]
2016-11-19 01:32:01 DEBUG [1146] Cluster[]: Process node: Node[86]
2016-11-19 01:32:01 DEBUG [1146] Cluster[]: Process node: Node[94]
2016-11-19 01:32:01 DEBUG [1146] Cluster[]: Process node: Node[85]
2016-11-19 01:32:01 DEBUG [1146] Cluster[]: Process node: Node[virtual_sync_node]
2016-11-19 01:32:01 DEBUG [1146] Cluster[]: Start processing all nodes
2016-11-19 01:32:01 DEBUG [1146] Cluster[]: Process node: Node[87]
2016-11-19 01:32:01 DEBUG [1146] Cluster[]: Process node: Node[88]
2016-11-19 01:32:01 DEBUG [1146] Cluster[]: Process node: Node[89]
2016-11-19 01:32:01 DEBUG [1146] Cluster[]: Process node: Node[master]
2016-11-19 01:32:01 DEBUG [1146] Node[master]: Node master: task generate_keys_ceph, task status running
2016-11-19 01:32:01 WARNING [1146] Puppet agent master didn't respond within the allotted time
2016-11-19 01:32:01 DEBUG [1146] Task time summary: generate_keys_ceph with status failed on node master took 00:03:00
2016-11-19 01:32:01 DEBUG [1146] Node[master]: Decreasing node concurrency to: 0
2016-11-19 01:32:01 INFO [1146] Task[generate_haproxy_keys/master]: Run on node: Node[master]
2016-11-19 01:32:01 DEBUG [1146] Node[master]: Increasing node concurrency to: 1
2016-11-19 01:32:01 INFO [1146] Casting message to Nailgun:
{"method"=>"deploy_resp",
 "args"=>
  {"task_uuid"=>"73285015-8a11-4bd4-9041-a6107b2436cd",
   "nodes"=>
    [{"uid"=>"master",
      "status"=>"deploying",
      "progress"=>33,
      "deployment_graph_task_name"=>"generate_keys_ceph",
      "task_status"=>"error",
      "custom"=>
       {:time=>
         {"config_retrieval"=>0.229660536,
          "exec"=>0.217762443,
          "filebucket"=>7.5297e-05,
          "schedule"=>0.000276324,
          "total"=>0.4477746,
          "last_run"=>1479502386},
        :resources=>
         {"changed_resources"=>"Exec[generate_keys_ceph_shell]",
          "failed_resources"=>"",
          "failed"=>0,
          "changed"=>1,
          "total"=>8,
          "restarted"=>0,
          "out_of_sync"=>1,
          "failed_to_restart"=>0,
          "scheduled"=>0,
          "skipped"=>0},
        :changes=>{"total"=>1},
        :events=>{"failure"=>0, "success"=>1, "total"=>1},
        :version=>{"config"=>1479502384, "puppet"=>"3.8.5"},
        :status=>"running",
        :running=>1,
        :enabled=>1,
        :idling=>0,
        :stopped=>0,
        :lastrun=>1479502386,
        :runtime=>16734,
        :output=>"Currently running; last completed run 16734 seconds ago"}}]}}

2016-11-19 01:32:01 DEBUG [1146] Cluster[]: Process node: Node[91]

Steps to reproduce:
Create a new environment from web UI, configure network, settings, add nodes, verify networking connectivity, deploy.

We have:
 1 controller,cinder node
 3 ceph osd nodes
 1 compute node
 1 mongo ceilometer node
 4 base-os nodes.

Expected results:
a working openstack environment.

Actual result:
All nodes end up in error state with problem pointing to error in generating_keys_ceph.

Reproducibility:
Everytime.

Workaround:
None.

Impact:
Can not get a openstack system up and running.

Description of the environment:
 Operation system: fuel master node: CentOS7
 Versions of components: fuel-community-9.0
 Reference architecture: amd64
 Network model: neutron with tunneling segmentation
 Related projects installed: n/a
Additional information:
N/A.

Ken (kchiangusa)
description: updated
Revision history for this message
Ken (kchiangusa) wrote :

Looking at the logs a bit more. It seems like a race condition. The astute logs indicate that the deployment errors out 4 minutes before puppet finally create the ceph key:

astute.log:
2016-11-20 08:27:53 DEBUG [1169] Graph[virtual_sync_node]: Found failed tasks: post_deployment_start, deploy_start, pre_deployment_end, post_deployment_end, deploy_end
2016-11-20 08:27:53 INFO [1169] Cluster[]: All nodes are finished. Failed tasks: Task[generate_keys_ceph/master], Task[generate_keys/master] Stopping the deployment process!

puppet.log:
2016-11-20 08:31:41 +0000 Scope(Class[main]) (notice): MODULAR: generate_keys_ceph
2016-11-20 08:31:41 +0000 Puppet (notice): Compiled catalog for fuel.domain.tld in environment production in 0.05 seconds
2016-11-20 08:31:42 +0000 /Stage[main]/Main/Exec[generate_keys_ceph_shell]/returns (notice): Generating public/private rsa key pair.
2016-11-20 08:31:42 +0000 /Stage[main]/Main/Exec[generate_keys_ceph_shell]/returns (notice): Your identification has been saved in /var/lib/fuel/keys//21/ceph/ceph.
2016-11-20 08:31:42 +0000 /Stage[main]/Main/Exec[generate_keys_ceph_shell]/returns (notice): Your public key has been saved in /var/lib/fuel/keys//21/ceph/ceph.pub.

Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Please share full diagnostic snapshot.

Changed in fuel:
status: New → Incomplete
Revision history for this message
Ken (kchiangusa) wrote :

Sorry about that. Here's the diagnostic snapshot.

Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Ken, sorry for long delay, I didn't notice your message. Unfortunately there is no astute.log in log you have attached.

Revision history for this message
Ken (kchiangusa) wrote :

Hi,
I assumed that the snapshot would include the astute logs. Sorry for the incompleteness.

Here are the astute logs from the fuel server.

Thanks for your help in looking into this!

Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Ken, I cannot find build_number in your logs, can you

cat /etc/fuel_build_number

on master node?

Revision history for this message
Ken (kchiangusa) wrote :

Hello Oleksiy,

The build number is 4070.

Thanks!

Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Hi Ken, can you take latest ISO

http://seed.fuel-infra.org/fuelweb-iso/fuel-9.0-community-5101-2016-12-28_02-00-00.iso.torrent?from=status

As I can see it passed tests, that deploy ceph storage.

If you face this issue again, please generate diagnostic snapshot and share it here in comments.

Revision history for this message
Ken (kchiangusa) wrote :

hello,
Is there a way to get the iso directly instead of via torrent? It has been downloading for a long time and hasn't made any progress.

Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

AFAIK, it is only way. CAn you try from here:

https://www.fuel-infra.org/release/status

Changed in fuel:
importance: Undecided → Critical
milestone: none → 9.2
assignee: nobody → Stanislaw Bogatkin (sbogatkin)
Changed in fuel:
milestone: 9.2 → 9.3
Revision history for this message
Xiwen Deng (deng-xiwen) wrote :

Hello, I had face the same issue in MOS 9.0。In my env that have two openstack envs。One openstack deploy success, but another deploy failed。
Is that any method to solve this issue?

Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

Xiwen, the statement that you've faced the same issue without logs and any details is not helpful. Please provide a diagnostic snapshot.

Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.