483 node deployment Error: Provision has failed. end of file reached

Bug #1612597 reported by Sergey Galkin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
New
Undecided
Unassigned

Bug Description

I try to create cluster with about 480 nodes. VKuklin said what this is possible if deploy nodes step by step with 200 nodes on one step.

Deployment of first 170 nodes have failed with Error: Provision has failed. end of file reached
All nodes gone to Error state

in astute.log

2016-08-12 08:10:27 ERROR [20130] Error running provisioning: end of file reached, trace:
["/usr/share/ruby/net/protocol.rb:153:in `read_nonblock'",
 "/usr/share/ruby/net/protocol.rb:153:in `rbuf_fill'",
 "/usr/share/ruby/net/protocol.rb:134:in `readuntil'",
 "/usr/share/ruby/net/protocol.rb:144:in `readline'",
 "/usr/share/ruby/net/http/response.rb:39:in `read_status_line'",
 "/usr/share/ruby/net/http/response.rb:28:in `read_new'",
 "/usr/share/ruby/net/http.rb:1406:in `block in transport_request'",
 "/usr/share/ruby/net/http.rb:1403:in `catch'",
 "/usr/share/ruby/net/http.rb:1403:in `transport_request'",
 "/usr/share/ruby/net/http.rb:1376:in `request'",
 "/usr/share/ruby/net/http.rb:1322:in `request_post'",
 "/usr/share/ruby/xmlrpc/client.rb:475:in `do_rpc'",
 "/usr/share/ruby/xmlrpc/client.rb:279:in `call2'",
 "/usr/share/ruby/xmlrpc/client.rb:260:in `call'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/cobbler.rb:89:in `item_exists'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/cobbler.rb:101:in `system_exists?'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/cobbler.rb:149:in `netboot'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/cobbler_manager.rb:131:in `block in netboot_nodes'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/cobbler_manager.rb:127:in `each'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/cobbler_manager.rb:127:in `netboot_nodes'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/provision.rb:180:in `remove_nodes'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/provision.rb:526:in `prepare_nodes'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/provision.rb:45:in `provision'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/orchestrator.rb:128:in `provision'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/dispatcher.rb:51:in `provision'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/dispatcher.rb:37:in `image_provision'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/server.rb:187:in `dispatch_message'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/server.rb:146:in `block in dispatch'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/server.rb:144:in `each_with_index'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/server.rb:144:in `dispatch'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/server.rb:121:in `block in perform_main_job'"]

Tags: scale
Revision history for this message
Sergey Galkin (sgalkin) wrote :
Revision history for this message
Sergey Galkin (sgalkin) wrote :
Revision history for this message
Sergey Galkin (sgalkin) wrote :

can't attach snapshot the shotgun has failed 2 times
[181629.653207] shotgun[14731]: segfault at 58 ip 00007f3311996907 sp 00007f321b7fd800 error 4 in libpython2.7.so.1.0[7f33118ba000+178000]
[183937.878394] shotgun[11114]: segfault at 58 ip 00007f5a18f72907 sp 00007f584bfe6800 error 4 in libpython2.7.so.1.0[7f5a18e96000+178000]

/var/log available on http://mos-scale-share.mirantis.com/fuel-log-bug-1612597.tar.gz

Revision history for this message
Sergey Galkin (sgalkin) wrote :

Can't attach the snapshot because shotgun filed 2 times with segfault
[181629.653207] shotgun[14731]: segfault at 58 ip 00007f3311996907 sp 00007f321b7fd800 error 4 in libpython2.7.so.1.0[7f33118ba000+178000]
[183937.878394] shotgun[11114]: segfault at 58 ip 00007f5a18f72907 sp 00007f584bfe6800 error 4 in libpython2.7.so.1.0[7f5a18e96000+178000]

/var/log available on http://mos-scale-share.mirantis.com/fuel-log-bug-1612597.tar.gz

Revision history for this message
Sergey Galkin (sgalkin) wrote :

[root@fuel ~]# shotgun2 short-report
cat /etc/fuel_build_id:
 598
cat /etc/fuel_build_number:
 598
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuel-release-9.0.0-1.mos6349.noarch
 fuelmenu-9.0.0-1.mos274.noarch
 fuel-notify-9.0.0-1.mos8460.noarch
 fuel-ostf-9.0.0-1.mos936.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8743.noarch
 fuel-mirror-9.0.0-1.mos141.noarch
 fuel-openstack-metadata-9.0.0-1.mos8743.noarch
 rubygem-astute-9.0.0-1.mos750.noarch
 fuel-misc-9.0.0-1.mos8460.noarch
 python-fuelclient-9.0.0-1.mos325.noarch
 fuel-9.0.0-1.mos6349.noarch
 fuel-utils-9.0.0-1.mos8460.noarch
 fuel-setup-9.0.0-1.mos6349.noarch
 nailgun-mcagents-9.0.0-1.mos750.noarch
 fuel-library9.0-9.0.0-1.mos8460.noarch
 network-checker-9.0.0-1.mos74.x86_64
 fuel-agent-9.0.0-1.mos285.noarch
 fuel-ui-9.0.0-1.mos2717.noarch
 fuel-migrate-9.0.0-1.mos8460.noarch
 python-packetary-9.0.0-1.mos141.noarch
 fuel-bootstrap-cli-9.0.0-1.mos285.noarch
 shotgun-9.0.0-1.mos90.noarch
 fuel-nailgun-9.0.0-1.mos8743.noarch

Revision history for this message
Sergey Galkin (sgalkin) wrote :

Reset cluster, disable "Nova quotas" and "Resume guests state on host boot" and redeploy on the same nodes has fixed deployment

Revision history for this message
Sergey Galkin (sgalkin) wrote :

Issue is not reproducible with
/etc/httpd/conf.modules.d/20-reqtimeout.conf
<IfModule reqtimeout_module>
     RequestReadTimeout header=60 body=60
</IfModule>

Revision history for this message
Sergey Galkin (sgalkin) wrote :

sorry the /etc/httpd/conf.modules.d/20-reqtimeout.conf is
<IfModule reqtimeout_module>
     RequestReadTimeout header=120 body=120
</IfModule>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.