[library] Deployment with 5 controllers and many other nodes has failed. Timeout is exceeded.

Bug #1333709 reported by Anastasia Palkina
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
High
Fuel Library (Deprecated)

Bug Description

"build_id": "2014-06-23_00-31-14",
"mirantis": "yes",
"build_number": "265",
"ostf_sha": "429c373fb79b1073aa336bc62c6aad45a8f93af6",
"nailgun_sha": "eaabb2c9bbe8e921aaa231960dcda74a7bc86213",
"production": "docker",
"api": "1.0",
"fuelmain_sha": "4394ca9be6540d652cc3919556633d9381e0db64",
"astute_sha": "694b5a55695e01e1c42185bfac9cc7a641a9bd48",
"release": "5.1",
"fuellib_sha": "dc2713b3ba20ccff2816cf61e8481fe2f17ed69b"

1. Create new environment (Ubuntu, HA mode)
2. Choose GRE segmentation
3. Choose both Ceph
4. Choose installing Ceilometer
5. Add 5 controllers, 1 compute, 1 cinder+mongo, 2 ceph
6. Start deployment. It has failed. Timeout of deployment is exceeded.

First controller deployed during 1,5 hour. It's too long. Is it a normal situation for this case or not?

In logs controllers node-4,5,6,7,8

Also there are errors in astute log:
2014-06-24 12:55:49 ERR

[388] fe55e0a1-9790-47c9-8aa0-d395f69fd6e4: cmd: ruby -r 'yaml' -e 'y = YAML.load_file("/etc/astute.yaml"); y["nodes"] = YAML.load_file("/tmp/astute.yaml"); File.open("/etc/astute.yaml", "w") { |f| f.write y.to_yaml }'; puppet apply --logdest syslog --debug -e '$settings=parseyaml($::astute_settings_yaml) $nodes_hash=$settings["nodes"] class {"l23network::hosts_file": nodes => $nodes_hash }'
                                               mcollective error: fe55e0a1-9790-47c9-8aa0-d395f69fd6e4: MCollective agents '3' didn't respond within the allotted time.

2014-06-24 12:55:49 ERR

[388] MCollective agents '3' didn't respond within the allotted time.

Revision history for this message
Anastasia Palkina (apalkina) wrote :
description: updated
Changed in fuel:
status: New → Confirmed
importance: Undecided → High
Denis Ipatov (dipatov)
tags: added: customer-found
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Sergii Golovatiuk (sgolovatiuk)
assignee: Sergii Golovatiuk (sgolovatiuk) → nobody
assignee: nobody → Sergii Golovatiuk (sgolovatiuk)
Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

The deployment failed due to Ceph errors

ue Jul 01 16:34:10 +0000 2014 /Stage[main]/Ceph::Mon/Exec[ceph-deploy mon create]/returns (notice): [192.168.0.2][DEBUG ] remote hostname: node-1
Tue Jul 01 16:34:10 +0000 2014 /Stage[main]/Ceph::Mon/Exec[ceph-deploy mon create]/returns (notice): [192.168.0.2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
Tue Jul 01 16:34:10 +0000 2014 /Stage[main]/Ceph::Mon/Exec[ceph-deploy mon create]/returns (notice): [ceph_deploy.mon][ERROR ] RuntimeError: config file /etc/ceph/ceph.conf exists with different content; use --overwrite-conf to overwrite
Tue Jul 01 16:34:10 +0000 2014 /Stage[main]/Ceph::Mon/Exec[ceph-deploy mon create]/returns (notice): [ceph_deploy][ERROR ] GenericError: Failed to create 1 monitors
Tue Jul 01 16:34:10 +0000 2014 Puppet (err): ceph-deploy mon create node-1:192.168.0.2 returned 1 instead of one of [0]
/usr/lib/ruby/site_ruby/1.8/puppet/util/errors.rb:97:in `fail'
/usr/lib/ruby/site_ruby/1.8/puppet/type/exec.rb:120:in `sync'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction/resource_harness.rb:193:in `sync'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction/resource_harness.rb:130:in `sync_if_needed'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction/resource_harness.rb:82:in `perform_changes'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction/resource_harness.rb:81:in `each'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction/resource_harness.rb:81:in `perform_changes'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction/resource_harness.rb:18:in `evaluate'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction.rb:174:in `apply'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction.rb:187:in `eval_resource'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction.rb:117:in `call'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction.rb:117:in `evaluate'
/usr/lib/ruby/site_ruby/1.8/puppet/util.rb:327:in `thinmark'
/usr/lib/ruby/1.8/benchmark.rb:308:in `realtime'
/usr/lib/ruby/site_ruby/1.8/puppet/util.rb:326:in `thinmark'

Changed in fuel:
importance: High → Critical
Changed in fuel:
assignee: Sergii Golovatiuk (sgolovatiuk) → Dmitry Borodaenko (dborodaenko)
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Iso #265 where this bug was found had ceph 0.67.8 and ceph-deploy 1.2.7. This means this bug is not related to https://bugs.launchpad.net/fuel/+bug/1333814, but it's also likely no longer reproducible.

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

The "ceph-deploy mon create" failure is related to https://bugs.launchpad.net/fuel/+bug/1333814 and is not applicable to this bug. There were no failures like that in the attached snapshot, the problem here is that deployment took too long and timed out. Downgrading back to High.

Changed in fuel:
importance: Critical → High
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :
Download full text (3.4 KiB)

Noticeable time gaps in puppet.log for node-4:

Tue Jun 24 11:42:29 +0000 2014 /Stage[main]/Galera/Cs_commit[p_mysql]/cib (notice): defined 'cib' as 'mysql'
Tue Jun 24 11:44:36 +0000 2014 /Stage[main]/Galera/Service[mysql]/ensure (notice): ensure changed 'stopped' to 'running'

Tue Jun 24 11:48:41 +0000 2014 /Stage[main]/Nova/Nova_config[DEFAULT/use_syslog]/ensure (notice): created
Tue Jun 24 11:54:11 +0000 2014 /Stage[main]/Nova::Utilities/Package[guestmount]/ensure (notice): ensure changed 'purged' to 'present'

Tue Jun 24 12:05:50 +0000 2014 /Stage[main]/Keystone::Roles::Admin/Keystone_role[admin]/ensure (notice): created
Tue Jun 24 12:08:45 +0000 2014 /Stage[main]/Heat::Keystone::Auth/Keystone_user_role[heat@services]/roles (notice): roles changed ['_member_'] to 'admin'

Tue Jun 24 12:15:52 +0000 2014 /Stage[main]/Neutron::Network::Predefined_networks/Neutron_net[net04]/ensure (notice): created
Tue Jun 24 12:17:18 +0000 2014 /Stage[main]/Neutron::Network::Predefined_networks/Neutron_subnet[net04__subnet]/ensure (notice): created

Tue Jun 24 12:20:42 +0000 2014 /Stage[main]/Neutron::Network::Predefined_networks/Neutron_net[net04_ext]/ensure (notice): created
Tue Jun 24 12:21:16 +0000 2014 /Stage[main]/Neutron::Network::Predefined_networks/Neutron_subnet[net04_ext__subnet]/ensure (notice): created
Tue Jun 24 12:23:58 +0000 2014 /Stage[main]/Neutron::Network::Predefined_networks/Neutron_router[router04]/ensure (notice): created
Tue Jun 24 12:30:15 +0000 2014 /Stage[main]/Neutron::Network::Predefined_networks/Neutron_floatingip_pool[admin]/ensure (notice): created

Tue Jun 24 12:31:39 +0000 2014 /Stage[main]/Neutron::Agents::Ovs/Service[neutron-ovs-agent]/enable (notice): enable changed 'true' to 'true'
Tue Jun 24 12:33:49 +0000 2014 /Stage[main]/Neutron::Agents::Ovs/Service[neutron-ovs-agent] (notice): Triggered 'refresh' from 1 events

Tue Jun 24 12:37:51 +0000 2014 /Stage[main]/Neutron::Agents::Dhcp/Service[neutron-dhcp-service]/enable (notice): enable changed 'true' to 'true'
Tue Jun 24 12:40:03 +0000 2014 /Stage[main]/Neutron::Agents::Dhcp/Service[neutron-dhcp-service] (notice): Triggered 'refresh' from 1 events

Tue Jun 24 12:42:21 +0000 2014 /Stage[main]/Neutron::Agents::L3/Cs_commit[l3]/cib (notice): defined 'cib' as 'l3'
Tue Jun 24 12:43:33 +0000 2014 /Stage[main]/Neutron::Agents::L3/Service[neutron-l3]/enable (notice): enable changed 'true' to 'true'
Tue Jun 24 12:46:13 +0000 2014 /Stage[main]/Neutron::Agents::L3/Service[neutron-l3] (notice): Triggered 'refresh' from 1 events

Tue Jun 24 12:50:46 +0000 2014 /Stage[main]/Heat::Engine/Service[heat-engine]/enable (notice): enable changed 'true' to 'true'
Tue Jun 24 12:52:51 +0000 2014 /Stage[main]/Heat::Engine/Service[heat-engine] (notice): Triggered 'refresh' from 42 events

Tue Jun 24 12:54:39 +0000 2014 /Stage[main]/Ceilometer::Collector/Service[ceilometer-collector]/ensure (notice): ensure changed 'stopped' to 'running'
Tue Jun 24 12:56:46 +0000 2014 /Stage[main]/Ceilometer::Agent::Central/Service[ceilometer-agent-central]/ensure (notice): ensure changed 'stopped' to 'running'

Tue Jun 24 12:57:15 +0000 2014 /Stage[main]/Openstack::Ha::Ceilometer/Openstack::Ha::Haproxy_service[cei...

Read more...

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

The gaps are diverse enough, and include the usual suspects such as puppet corosync resources (very CPU heavy due to ruby1.8 and rexml) and neutron, so it's likely that the test failed because test environment was starved for CPU.

Changed in fuel:
status: Confirmed → Incomplete
assignee: Dmitry Borodaenko (dborodaenko) → Anastasia Palkina (apalkina)
Dmitry Ilyin (idv1985)
summary: - Deployment with 5 controllers and many other nodes has failed. Timeout
- is exceeded.
+ [library Deployment with 5 controllers and many other nodes has failed.
+ Timeout is exceeded.
summary: - [library Deployment with 5 controllers and many other nodes has failed.
+ [library] Deployment with 5 controllers and many other nodes has failed.
Timeout is exceeded.
Revision history for this message
Anastasia Palkina (apalkina) wrote :

Reproduced on ISO #347
"build_id": "2014-07-23_02-01-14",
"ostf_sha": "c1b60d4bcee7cd26823079a86e99f3f65414498e",
"build_number": "347",
"auth_required": false,
"api": "1.0",
"nailgun_sha": "f5775d6b7f5a3853b28096e8c502ace566e7041f",
"production": "docker",
"fuelmain_sha": "74b9200955201fe763526ceb51607592274929cd",
"astute_sha": "fd9b8e3b6f59b2727b1b037054f10e0dd7bd37f1",
"feature_groups": ["mirantis"],
"release": "5.1",
"fuellib_sha": "fb0e84c954a33c912584bf35054b60914d2a2360"

1. Create new environment (Ubuntu, HA mode)
2. Choose GRE segmentation
3. Choose both Ceph
4. Choose Ceilometer
5. Add 2 controllers, 3 controller+mongo, compute, cinder+mongo, 2 ceph
6. Start deployment. It has failed. Timeout of deployment is exceeded.

First controller deployed during 2 hours.

Controllers in logs: node-2,3,4,11,12

Revision history for this message
Anastasia Palkina (apalkina) wrote :
Changed in fuel:
status: Incomplete → Confirmed
assignee: Anastasia Palkina (apalkina) → Fuel Library Team (fuel-library)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.