HA deployment failed with "timeout of deployment is exceeded" error

Bug #1249337 reported by Max Grishkin
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Undecided
Unassigned

Bug Description

Environment settings:
1. CentOS, HA enabled, ceph enabled, 3 controller nodes
2. Nova-network with VLAN manager, every network is assigned to separate interface, vlan tagging is off for all networks except for fixed network
3. Murano is enabled
CentOS is installed successfully on all 3 nodes, catalog run finised in 1578 seconds on 1st node, in 2007 seconds on 2nd node and in 4156 seconds on 3rd node. The whole deployment then fails with "Timeout of deployment is exceeded.". The longest time seems to be consumed by Murano installation.

Revision history for this message
Max Grishkin (grishkin) wrote :
Revision history for this message
Max Grishkin (grishkin) wrote :

http://10.20.0.2:8000/api/version
{"release": "3.2.1", "nailgun_sha": "a8d57bce0e284a2aa87d24e3e5b58b1f289a960e", "ostf_sha": "ce6aabb1f24c5328e316c49f965b4246a5ca5ca6", "astute_sha": "df6ddea3abc93fbe1cab9b4534d4d5e9508c95d6", "fuellib_sha": "4039359fac51de0951fa81d457cddc9f318c35dc"}

Roman Vyalov (r0mikiam)
Changed in fuel:
assignee: nobody → Timur Nurlygayanov (tnurlygayanov)
Changed in fuel:
status: New → Incomplete
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Need provide more information.
I can see the same behavior on the my desctop with the SATA HDD disk (with Virtual Box hypervisor), but this is not reproduced on my other desctop with Intel Core i5 and 128Gb SSD disk for VMs (with KVM hypervisor)

And I belive that Murano is not the root of this problem. (I can see in the logs no errors during the murano installation)
Can you please try to install the same configuration without Ceph installation? How long it will deploy OpenStack in HA mode on this lab?

Revision history for this message
Max Grishkin (grishkin) wrote :

Retested the same deployment without ceph, deployment lasted for 2h20m.
Timeout error did not appear, but I get cirros uploading error on 3rd node:
[8836] Error running RPC method deploy: Upload cirros image failed, trace: ["/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/orchestrator.rb:281:in `upload_cirros_image'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/orchestrator.rb:46:in `deploy'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/dispatcher.rb:96:in `deploy'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:103:in `dispatch_message'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:70:in `block in dispatch'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:68:in `each'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:68:in `each_with_index'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:68:in `dispatch'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:38:in `block (2 levels) in server_loop'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:49:in `block (2 levels) in consume_one'"]
2013-11-11 11:59:06 ERR
[8836] 346e1060-07df-4d87-8dc0-53df87a7e4f5: Upload cirros image failed

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Yes, thank you for this information.
We can see that this issue with Fuel deployment, not with Murano deployment.
And I suggest to change assigner for somethig who can fix it.

Changed in fuel:
assignee: Timur Nurlygayanov (tnurlygayanov) → nobody
summary: - HA deployment with ceph and Murano failed with "timeout of deployment is
- exceeded" error
+ HA deployment failed with "timeout of deployment is exceeded" error
Changed in fuel:
assignee: nobody → Dmitry Pyzhov (lux-place)
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

This error appears if glance is broken after deployment. Our library team needs access to the cluster in order to understand what is wrong.

Mike Scherbakov (mihgen)
Changed in fuel:
milestone: none → 4.0
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Dmitry Pyzhov (lux-place) → nobody
Revision history for this message
Nikolay Fedotov (nfedotov) wrote :

Can reproduce on {"release": "4.0", "nailgun_sha": "d79a914cc093b4e1bb8fb87d8d85d5657097f37f", "ostf_sha": "cf48dac2a6e7ad284fc93c529f3d1e4668504028", "astute_sha": "d14e9d2475c55b29a5d475f7c112d979c6251ff4", "fuellib_sha": "219cb747129ca19b53f42ede0e3257d18e915d9e"}

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Could you put diagnostic snapshot, please?

Changed in fuel:
status: Incomplete → Confirmed
status: Confirmed → Incomplete
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

4.0 release is not supported any more

Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.