Provision failed due to astute/cobbler timeouts.

Bug #1608700 reported by Nikita Koshikov on 2016-08-01
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
Georgy Kibardin
Mitaka
High
Georgy Kibardin

Bug Description

Steps to reproduce:

To deployed env with total number of nodes 120 add 80 new nodes.
Try to provision added nodes.

Expected result - provision should be successful

Actual result provision fails with error:

2016-08-01 06:27:28 WARNING [1444] Cobbler problem. Try to repeat: 1 attempt
2016-08-01 06:28:08 WARNING [1444] Cobbler problem. Try to repeat: 2 attempt
2016-08-01 06:28:48 ERROR [1444] Error occured while provisioning:
#<Net::ReadTimeout: Net::ReadTimeout>

2016-08-01 06:28:48 INFO [1444] Casting message to Nailgun:
{"method"=>"provision_resp",
 "args"=>
  {"task_uuid"=>"3ec17e5f-c596-4c08-8677-22efa082cebc",
   "status"=>"error",
   "error"=>"Net::ReadTimeout",
   "progress"=>100}}

2016-08-01 06:28:49 INFO [1444] 3ec17e5f-c596-4c08-8677-22efa082cebc: stdout: stderr: Warning: Permanently added '172.29.193.78' (ECDSA) to the list of known hosts.
     exit code: 0
2016-08-01 06:28:49 INFO [1444] Provision summary: time was spent 00:20:04
2016-08-01 06:28:49 ERROR [1444] Error running provisioning: Net::ReadTimeout, trace:
["/usr/share/ruby/net/protocol.rb:158:in `rescue in rbuf_fill'",
 "/usr/share/ruby/net/protocol.rb:152:in `rbuf_fill'",
 "/usr/share/ruby/net/protocol.rb:134:in `readuntil'",
 "/usr/share/ruby/net/protocol.rb:144:in `readline'",
 "/usr/share/ruby/net/http/response.rb:39:in `read_status_line'",
 "/usr/share/ruby/net/http/response.rb:28:in `read_new'",
 "/usr/share/ruby/net/http.rb:1406:in `block in transport_request'",
 "/usr/share/ruby/net/http.rb:1403:in `catch'",
 "/usr/share/ruby/net/http.rb:1403:in `transport_request'",
 "/usr/share/ruby/net/http.rb:1376:in `request'",
 "/usr/share/ruby/net/http.rb:1322:in `request_post'",
 "/usr/share/ruby/xmlrpc/client.rb:475:in `do_rpc'",
 "/usr/share/ruby/xmlrpc/client.rb:279:in `call2'",
 "/usr/share/ruby/xmlrpc/client.rb:260:in `call'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/cobbler.rb:115:in `sync'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/cobbler_manager.rb:173:in `sync'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/cobbler_manager.rb:124:in `edit_nodes'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/provision.rb:235:in `provision_piece'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/provision.rb:122:in `block (3 levels) in provision_and_watch_progress'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/provision.rb:336:in `call'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/provision.rb:336:in `sleep_not_greater_than'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/provision.rb:116:in `block (2 levels) in provision_and_watch_progress'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/provision.rb:115:in `loop'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/provision.rb:115:in `block in provision_and_watch_progress'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/provision.rb:114:in `catch'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/provision.rb:114:in `provision_and_watch_progress'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/provision.rb:46:in `provision'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/orchestrator.rb:127:in `provision'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/dispatcher.rb:52:in `provision'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/dispatcher.rb:37:in `image_provision'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/server.rb:187:in `dispatch_message'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/server.rb:146:in `block in dispatch'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/server.rb:144:in `each_with_index'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/server.rb:144:in `dispatch'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/server.rb:121:in `block in perform_main_job'"]

The problem is that "cobbler sync" takes more than 30 second. And default xml::rpm timeout is set to 30.

If we modify cobbler.rb with:

@remote.timeout = 60

Than "cobbler sync" finished, but another problem acquires:

2016-08-01 21:24:16 DEBUG [431] Data received by ProvisiningProxyReporter to report it up:
{"status"=>"error", "error"=>"end of file reached", "progress"=>100}

2016-08-01 21:24:16 DEBUG [431] Data send by DeploymentProxyReporter to report it up:
{"status"=>"error", "error"=>"end of file reached", "progress"=>100}

2016-08-01 21:24:16 INFO [431] Casting message to Nailgun:
{"method"=>"provision_resp",
 "args"=>
  {"task_uuid"=>"0b0e2702-7eb4-4d7e-b785-70933c72aa3e",
   "status"=>"error",
   "error"=>"end of file reached",
   "progress"=>100}}

2016-08-01 21:24:16 DEBUG [431] 0b0e2702-7eb4-4d7e-b785-70933c72aa3e: MC agent 'execute_shell_command', method 'execute', results:
{:sender=>"33",
 :statuscode=>0,
 :statusmsg=>"OK",
 :data=>{:stdout=>"", :stderr=>"", :exit_code=>0}}

2016-08-01 21:24:16 DEBUG [431] Unlock discovery for failed nodes. Result: [{"uid"=>"33", "exit code"=>0}]
2016-08-01 21:24:16 DEBUG [431] 0b0e2702-7eb4-4d7e-b785-70933c72aa3e: MC agent 'execute_shell_command', method 'execute', results:
{:sender=>"master",
 :statuscode=>0,
 :statusmsg=>"OK",
 :data=>
  {:stdout=>"",
   :stderr=>
    "Warning: Permanently added '172.29.193.78' (ECDSA) to the list of known hosts.\r\n",
   :exit_code=>0}}

2016-08-01 21:24:16 INFO [431] 0b0e2702-7eb4-4d7e-b785-70933c72aa3e: stdout: stderr: Warning: Permanently added '172.29.193.78' (ECDSA) to the list of known hosts.
     exit code: 0
2016-08-01 21:24:16 INFO [431] Provision summary: time was spent 00:01:41
2016-08-01 21:24:16 ERROR [431] Error running provisioning: end of file reached, trace:
["/usr/share/ruby/net/protocol.rb:153:in `read_nonblock'",
 "/usr/share/ruby/net/protocol.rb:153:in `rbuf_fill'",
 "/usr/share/ruby/net/protocol.rb:134:in `readuntil'",
 "/usr/share/ruby/net/protocol.rb:144:in `readline'",
 "/usr/share/ruby/net/http/response.rb:39:in `read_status_line'",
 "/usr/share/ruby/net/http/response.rb:28:in `read_new'",
 "/usr/share/ruby/net/http.rb:1406:in `block in transport_request'",
 "/usr/share/ruby/net/http.rb:1403:in `catch'",
 "/usr/share/ruby/net/http.rb:1403:in `transport_request'",
 "/usr/share/ruby/net/http.rb:1376:in `request'",
 "/usr/share/ruby/net/http.rb:1322:in `request_post'",
 "/usr/share/ruby/xmlrpc/client.rb:475:in `do_rpc'",
 "/usr/share/ruby/xmlrpc/client.rb:279:in `call2'",
 "/usr/share/ruby/xmlrpc/client.rb:260:in `call'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/cobbler.rb:93:in `item_exists'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/cobbler.rb:108:in `system_exists?'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/cobbler.rb:161:in `netboot'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/cobbler_manager.rb:132:in `block in netboot_nodes'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/cobbler_manager.rb:128:in `each'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/cobbler_manager.rb:128:in `netboot_nodes'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/provision.rb:180:in `remove_nodes'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/provision.rb:526:in `prepare_nodes'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/provision.rb:45:in `provision'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/orchestrator.rb:127:in `provision'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/dispatcher.rb:52:in `provision'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/dispatcher.rb:37:in `image_provision'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/server.rb:187:in `dispatch_message'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/server.rb:146:in `block in dispatch'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/server.rb:144:in `each_with_index'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/server.rb:144:in `dispatch'",
 "/usr/share/gems/gems/astute-9.0.0/lib/astute/server/server.rb:121:in `block in perform_main_job'"]

The root cause of this is apache proxy, that breaks connection. It can be tuned with:
cat /etc/httpd/conf.modules.d/20-reqtimeout.conf
<IfModule reqtimeout_module>
      RequestReadTimeout header=60 body=60
</IfModule>

After this - provision started to work.

We need to make this parameters configurable.

description: updated
Dmitry Pyzhov (dpyzhov) on 2016-08-04
no longer affects: fuel/newton
Changed in fuel:
assignee: nobody → Fuel Sustaining (fuel-sustaining-team)
status: New → Confirmed
tags: added: area-astute area-python
Dmitry Pyzhov (dpyzhov) on 2016-08-04
tags: added: 9.1-proposed
Georgy Kibardin (gkibardin) wrote :

Could you please gather a diagnostic snapshot.

Changed in fuel:
status: Confirmed → Incomplete
Changed in fuel:
status: Incomplete → Confirmed
Dmitry Pyzhov (dpyzhov) on 2016-09-06
Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Georgy Kibardin (gkibardin)
Georgy Kibardin (gkibardin) wrote :

I think, anyway we need to take a look on how we work with cobbler and why it is so slow. But, for 9.1 it would be reasonable just to increase the timeout.

tags: added: area-library

Reviewed: https://review.openstack.org/366755
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=f95bb52c8f4983e22fcec227381d012a7a7f047c
Submitter: Jenkins
Branch: stable/mitaka

commit f95bb52c8f4983e22fcec227381d012a7a7f047c
Author: Maksim Malchuk <email address hidden>
Date: Wed Sep 7 16:25:34 2016 +0300

    Configure Cobbler's Apache with reqtimeout module

    This change adds the reqtimeout module to the Cobbler's Apache
    configuration to reduce the risk of timeouts on large clusters.

    Change-Id: Ife2c421145a616c5033efca62870eff64c274217
    Related-Bug: #1608700
    Signed-off-by: Maksim Malchuk <email address hidden>

tags: added: in-stable-mitaka
Nikita Koshikov (nkoshikov) wrote :

This change is not enough. There is second part - that requires code to astute. Please update bug accordingly

Maksim Malchuk (mmalchuk) wrote :

Nikita, this is 'Related fix' as you can see in the #3 and #4.

Georgy Kibardin (gkibardin) wrote :

Aha, my bad, I've forgotten about astute timeout.

Fix proposed to branch: master
Review: https://review.openstack.org/367240

Changed in fuel:
status: Confirmed → In Progress

Change abandoned by Georgy Kibardin (<email address hidden>) on branch: master
Review: https://review.openstack.org/367240

Changed in fuel:
status: In Progress → Confirmed

Reviewed: https://review.openstack.org/367205
Committed: https://git.openstack.org/cgit/openstack/fuel-astute/commit/?id=f030161d190f06b8c8fd3c158b46b4625b3f5151
Submitter: Jenkins
Branch: stable/mitaka

commit f030161d190f06b8c8fd3c158b46b4625b3f5151
Author: Georgy Kibardin <email address hidden>
Date: Thu Sep 8 11:29:56 2016 +0300

    Increase xml rpc timeout

    With hundreds of nodes Cobbler sync cannot fit default 30 secods
    timeout. Cobbler performance is going to be investigated in the next
    release. By now lets just increase the timeout.

    Change-Id: Ief8ff93fc808549e8d729040512a266b0c09383d
    Closes-Bug: #1608700

tags: added: scale
Changed in fuel:
status: Confirmed → In Progress
Georgy Kibardin (gkibardin) wrote :

I would like to look at what exactly cobblers spends time for. Please attach cobbler logs.

Changed in fuel:
status: In Progress → Incomplete
Changed in fuel:
status: Incomplete → In Progress
tags: added: release-notes
tags: added: release-notes-done
removed: release-notes

Reviewed: https://review.openstack.org/367240
Committed: https://git.openstack.org/cgit/openstack/fuel-astute/commit/?id=0e93c8b6c8c0b4c8fd8401a073c4611811305c7f
Submitter: Jenkins
Branch: master

commit 0e93c8b6c8c0b4c8fd8401a073c4611811305c7f
Author: Georgy Kibardin <email address hidden>
Date: Thu Sep 8 11:29:56 2016 +0300

    Increase xml rpc timeout

    With hundreds of nodes Cobbler sync cannot fit default 30 secods
    timeout. Cobbler performance is going to be investigated in the next
    release. By now lets just increase the timeout.

    Change-Id: Ief8ff93fc808549e8d729040512a266b0c09383d
    Closes-Bug: #1608700
    (cherry picked from commit f030161d190f06b8c8fd3c158b46b4625b3f5151)

Changed in fuel:
status: In Progress → Fix Committed

This issue was fixed in the openstack/fuel-astute 10.0.0rc1 release candidate.

This issue was fixed in the openstack/fuel-astute 10.0.0 release.

Change abandoned by Dmitry Ilyin (<email address hidden>) on branch: master
Review: https://review.openstack.org/430935

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers