Deploy for Ubuntu simple fails with Cobbler error. Error running provisioning: Net::ReadTimeout,

Bug #1387699 reported by Anastasia Palkina
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Vladimir Sharshov
5.1.x
Fix Committed
High
Vladimir Sharshov

Bug Description

"build_id": "2014-10-30_04-21-22",
"ostf_sha": "f47fd1d66a7255213ee075d5c11b8f111f922000",
"build_number": "63",
"auth_required": true,
"api": "1.0",
"nailgun_sha": "02c6bb2e54bbec76da33167eaf5f2e0b3e2e50a7",
"production": "docker",
"fuelmain_sha": "2ade7c571380a091048d103a6affff634b5b2520",
"astute_sha": "97eea90efe0a1f17b4934919d6e459d270c10372",
"feature_groups": ["mirantis", "techpreview"],
"release": "6.0-techpreview",
"release_versions": {"2014.2-6.0-techpreview": {"VERSION": {"build_id": "2014-10-30_04-21-22", "ostf_sha": "f47fd1d66a7255213ee075d5c11b8f111f922000", "build_number": "63", "api": "1.0", "nailgun_sha": "02c6bb2e54bbec76da33167eaf5f2e0b3e2e50a7", "production": "docker", "fuelmain_sha": "2ade7c571380a091048d103a6affff634b5b2520", "astute_sha": "97eea90efe0a1f17b4934919d6e459d270c10372", "feature_groups": ["mirantis", "techpreview"], "release": "6.0-techpreview", "fuellib_sha": "45b6fc42091a0a33d3e48fbe78b782ce743aedc1"}}},
"fuellib_sha": "45b6fc42091a0a33d3e48fbe78b782ce743aedc1"

1. Create new environment (Ubuntu, simple mode)
2. Choose neutron, vlan
3. Choose both Ceph
4. Add 1 controller+ceph, 1 compute+ceph, 1 ceph
5. Start deployment
6. Deployment has failed during provisioning with Cobbler error on UI but deployment continues (see screen). Nodes has gone offline on UI. When provisioning finished nodes became online and had status 'error' (see screen)

I reproduced this bug 2 times.

Revision history for this message
Anastasia Palkina (apalkina) wrote :
Revision history for this message
Anastasia Palkina (apalkina) wrote :
Revision history for this message
Anastasia Palkina (apalkina) wrote :
Evgeniy L (rustyrobot)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Fuel Astute Team (fuel-astute)
Evgeniy L (rustyrobot)
Changed in fuel:
status: New → Confirmed
Revision history for this message
Anastasia Palkina (apalkina) wrote :

Error in astute.log:

[431] Error running provisioning: Net::ReadTimeout, trace:
["/usr/lib64/ruby/2.1.0/net/protocol.rb:158:in `rescue in rbuf_fill'",
 "/usr/lib64/ruby/2.1.0/net/protocol.rb:152:in `rbuf_fill'",
 "/usr/lib64/ruby/2.1.0/net/protocol.rb:134:in `readuntil'",
 "/usr/lib64/ruby/2.1.0/net/protocol.rb:144:in `readline'",
 "/usr/lib64/ruby/2.1.0/net/http/response.rb:39:in `read_status_line'",
 "/usr/lib64/ruby/2.1.0/net/http/response.rb:28:in `read_new'",
 "/usr/lib64/ruby/2.1.0/net/http.rb:1408:in `block in transport_request'",
 "/usr/lib64/ruby/2.1.0/net/http.rb:1405:in `catch'",
 "/usr/lib64/ruby/2.1.0/net/http.rb:1405:in `transport_request'",
 "/usr/lib64/ruby/2.1.0/net/http.rb:1378:in `request'",
 "/usr/lib64/ruby/2.1.0/net/http.rb:1324:in `request_post'",
 "/usr/lib64/ruby/2.1.0/xmlrpc/client.rb:482:in `do_rpc'",
 "/usr/lib64/ruby/2.1.0/xmlrpc/client.rb:286:in `call2'",
 "/usr/lib64/ruby/2.1.0/xmlrpc/client.rb:267:in `call'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/cobbler.rb:108:in `sync'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/cobbler_manager.rb:131:in `sync'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/cobbler_manager.rb:83:in `reboot_nodes'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/orchestrator.rb:77:in `provision'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/dispatcher.rb:41:in `provision'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/server.rb:136:in `dispatch_message'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/server.rb:99:in `block in dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/server.rb:97:in `each_with_index'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/server.rb:97:in `dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/server.rb:85:in `block in perform_main_job'"]

summary: - Deploy for Ubuntu simple fails with Cobbler error
+ Deploy for Ubuntu simple fails with Cobbler error. Error running
+ provisioning: Net::ReadTimeout,
Revision history for this message
Anastasia Palkina (apalkina) wrote :

Reproduced it again for CentOS, simple

Changed in fuel:
assignee: Fuel Astute Team (fuel-astute) → Vladimir Sharshov (vsharshov)
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

As Nastya's described, the key of problem: 2 big parallel deployment starting approximately at same time.

I try to reproduce and if could not, mark it as incomplete until it reproducing again.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (master)

Fix proposed to branch: master
Review: https://review.openstack.org/134929

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :
Download full text (4.5 KiB)

Reproduced on the custom ISO built with the proposed path: http://jenkins-product.srt.mirantis.net:8080/view/custom_iso/job/custom_6_0_iso/502/

Two clusters (1 controller + 4 compute) was created, one with CentOS and another with Ubuntu.
Cluster with CentOS was failed with the same "Cobbler error", all nodes was marked as 'failed'.

==== astute.log
2014-11-20T10:22:07 warning: [416] Cobbler problem. Try to repeat: 1 attempt
...
2014-11-20T10:22:51 warning: [416] Cobbler problem. Try to repeat: 2 attempt
2014-11-20T10:23:14 err: [413] Error occured while provisioning: #<XMLRPC::FaultException: <type 'exceptions.OSError'>:[Errno 2] No such file or directory: '/var/lib/tftpboot/images/bootstrap/initramfs.img'>
2014-11-20T10:23:14 info: [413] Casting message to Nailgun: {"method"=>"provision_resp", "args"=>{"task_uuid"=>"a2510311-eccb-4c01-a562-70efa7b54b82", "status"=>"error", "error"=>"Cobbler error", "progress"=>100}}
2014-11-20T10:23:16 debug: [413] Unlock discovery for failed nodes. Result: []
2014-11-20T10:23:16 err: [413] Error running provisioning: <type 'exceptions.OSError'>:[Errno 2] No such file or directory: '/var/lib/tftpboot/images/bootstrap/initramfs.img', trace:
["/usr/lib64/ruby/2.1.0/xmlrpc/client.rb:271:in `call'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/cobbler.rb:108:in `sync'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/cobbler_manager.rb:131:in `sync'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/cobbler_manager.rb:83:in `reboot_nodes'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/orchestrator.rb:77:in `provision'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/dispatcher.rb:41:in `provision'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/server.rb:142:in `dispatch_message'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/server.rb:103:in `block in dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/server.rb:101:in `each_with_index'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/server.rb:101:in `dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/server.rb:85:in `block in perform_main_job'"]

2014-11-20T10:23:16 debug: [413] Dispatching aborted by provision
===========================

See the attached diagnostic snapshot. Please note that the snapshot contains several tries of start deploy/stop provisioning before the bug appeared.

On another environment I faced the same bug with another 'no such file or directory':
===========================
2014-11-19T13:50:41 err: [405] Error running provisioning: <type 'exceptions.OSError'>:[Errno 2] No such file or directory: '/var/lib/tftpboot/images/centos-x86_64/vmlinuz', trace:
["/usr/lib64/ruby/2.1.0/xmlrpc/cli...

Read more...

Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

{"build_id": "2014-11-17_13-47-05", "ostf_sha": "82465a94eed4eff1fc8d8e1f2fb7e9993c22f068", "build_number": "502", "auth_required": true, "api": "1.0", "nailgun_sha": "87100dca641d76fd560eac7f7894d9413d8c186d", "production": "docker", "fuelmain_sha": "b585c7082511936ca3ac27e7ed12d1e2386feb90", "astute_sha": "d4fd7048befd22feecf5ba40f17981df5b608621", "feature_groups": ["mirantis"], "release": "6.0", "release_versions": {"2014.2-6.0": {"VERSION": {"build_id": "2014-11-17_13-47-05", "ostf_sha": "82465a94eed4eff1fc8d8e1f2fb7e9993c22f068", "build_number": "502", "api": "1.0", "nailgun_sha": "87100dca641d76fd560eac7f7894d9413d8c186d", "production": "docker", "fuelmain_sha": "b585c7082511936ca3ac27e7ed12d1e2386feb90", "astute_sha": "d4fd7048befd22feecf5ba40f17981df5b608621", "feature_groups": ["mirantis"], "release": "6.0", "fuellib_sha": "0f51ec2c95ea031ef1190b86d336bee5779b7ed7"}}}, "fuellib_sha": "0f51ec2c95ea031ef1190b86d336bee5779b7ed7"}

Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

As i can say, this is problem with Cobbler. Fix worked and try to send request to Cobbler several times, but problem in Cobbler have fatal nature not connected with fix as i can see.

Revision history for this message
Baboune (seyvet) wrote :

Got same problem on 5.1. Two parallels environment (13 nodes and 3 nodes), using centos icehouse, cinder ceph, glance default, neutron vlan.

Failed to generate a snapshot: "Dump is timed out". Will try again later.

Revision history for this message
Baboune (seyvet) wrote :

"Dump timed out" again.. Can not upload snapshot.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/134929
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=0bcc91185984a1f9e4bd68d9f247266c35349dae
Submitter: Jenkins
Branch: master

commit 0bcc91185984a1f9e4bd68d9f247266c35349dae
Author: Vladimir Sharshov (warpc) <email address hidden>
Date: Mon Nov 17 16:01:14 2014 +0300

    Add retries in case of Cobbler overloading

    Offen if Cobbler run two big task in parallel
    we got error based on Cobbler overloading.
    If we wait and try request again, it will
    process without problem. This change
    implement such behavior.

    Change-Id: I1e9774b9398dfb502159d7ba21e7e5b55b75b3a1
    Closes-Bug: #1387699
    Closes-Bug: #1396181

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Baboune (seyvet) wrote :

Deployed patch on a 5.1 setup and it worked.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (stable/5.1)

Fix proposed to branch: stable/5.1
Review: https://review.openstack.org/137553

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (stable/5.1)

Reviewed: https://review.openstack.org/137553
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=ef8aa0fd0e3ce20709612906f1f0551b5682a6ce
Submitter: Jenkins
Branch: stable/5.1

commit ef8aa0fd0e3ce20709612906f1f0551b5682a6ce
Author: Vladimir Sharshov (warpc) <email address hidden>
Date: Mon Nov 17 16:01:14 2014 +0300

    Add retries in case of Cobbler overloading

    Offen if Cobbler run two big task in parallel
    we got error based on Cobbler overloading.
    If we wait and try request again, it will
    process without problem. This change
    implement such behavior.

    Change-Id: I1e9774b9398dfb502159d7ba21e7e5b55b75b3a1
    Closes-Bug: #1387699
    Closes-Bug: #1396181
    (cherry picked from commit 0bcc91185984a1f9e4bd68d9f247266c35349dae)

Revision history for this message
Anastasia Palkina (apalkina) wrote :

Cannot reproduce on latest ISOs for 6.0 #49 and #56

"build_id": "2014-12-18_01-32-01", "ostf_sha": "a9afb68710d809570460c29d6c3293219d3624d4", "build_number": "56", "auth_required": true, "api": "1.0", "nailgun_sha": "5f91157daa6798ff522ca9f6d34e7e135f150a90", "production": "docker", "fuelmain_sha": "45caacadb878abfbd9d60e134d72229698b469c9", "astute_sha": "16b252d93be6aaa73030b8100cf8c5ca6a970a91", "feature_groups": ["mirantis"], "release": "6.0", "release_versions": {"2014.2-6.0": {"VERSION": {"build_id": "2014-12-18_01-32-01", "ostf_sha": "a9afb68710d809570460c29d6c3293219d3624d4", "build_number": "56", "api": "1.0", "nailgun_sha": "5f91157daa6798ff522ca9f6d34e7e135f150a90", "production": "docker", "fuelmain_sha": "45caacadb878abfbd9d60e134d72229698b469c9", "astute_sha": "16b252d93be6aaa73030b8100cf8c5ca6a970a91", "feature_groups": ["mirantis"], "release": "6.0", "fuellib_sha": "73332192a257ea02c40a39885c502ad1ebdf3eda"}}}, "fuellib_sha": "73332192a257ea02c40a39885c502ad1ebdf3eda"

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.