Nailgun agent execution expired

Bug #1340725 reported by Nastya Urlapova
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Fuel DevOps

Bug Description

{
build_id: "2014-07-10_00-39-56",
mirantis: "yes",
build_number: "112",
ostf_sha: "09b6bccf7d476771ac859bb3c76c9ebec9da9e1f",
nailgun_sha: "f5ff82558f99bb6ca7d5e1617eddddf7142fe857",
production: "docker",
api: "1.0",
fuelmain_sha: "293015843304222ead899270449495af91b06aed",
astute_sha: "5df009e8eab611750309a4c5b5c9b0f7b9d85806",
release: "5.0.1",
fuellib_sha: "364dee37435cbdc85d6b814a61f57800b83bf22d"
}

Deploy simple cluster - Ubuntu, 1xController, 2xComputer, neutron gre
Deployment of environment is done.
Snapshot env
After revert env all nodes are offline.
Execution nailgun agent failed with error:
[2014-07-11T12:17:20.200088 #27568] ERROR -- : execution expired
/usr/lib/ruby/vendor_ruby/httpclient/timeout.rb:43:in `new'/usr/lib/ruby/vendor_ruby/httpclient/session.rb:803:in `create_socket'/usr/lib/ruby/vendor_ruby/httpclient/session.rb:752:in `connect'/usr/lib/ruby/vendor_ruby/httpclient/timeout.rb:131:in `timeout'/usr/lib/ruby/vendor_ruby/httpclient/session.rb:751:in `connect'/usr/lib/ruby/vendor_ruby/httpclient/session.rb:609:in `query'/usr/lib/ruby/vendor_ruby/httpclient/session.rb:164:in `query'/usr/lib/ruby/vendor_ruby/httpclient.rb:1083:in `do_get_block'/usr/lib/ruby/vendor_ruby/httpclient.rb:887:in `do_request'/usr/lib/ruby/vendor_ruby/httpclient.rb:981:in `protect_keep_alive_disconnected'/usr/lib/ruby/vendor_ruby/httpclient.rb:886:in `do_request'/usr/lib/ruby/vendor_ruby/httpclient.rb:774:in `request'/usr/lib/ruby/vendor_ruby/httpclient.rb:689:in `put'/opt/nailgun/bin/agent:152:in `put'/opt/nailgun/bin/agent:557

mco ping works fine.

Revision history for this message
Nastya Urlapova (aurlapova) wrote :
Revision history for this message
Nastya Urlapova (aurlapova) wrote :
Revision history for this message
Nastya Urlapova (aurlapova) wrote :
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

Guys, some problem with api port accessibility from cluster nodes.

Master ip: 10.108.10.2

From master node: curl 10.108.10.2:8000/api/nodes (return success)

From cluster node:

root@master:~# ping 10.108.10.2
PING 10.108.10.2 (10.108.10.2) 56(84) bytes of data.
64 bytes from 10.108.10.2: icmp_req=1 ttl=64 time=0.339 ms
64 bytes from 10.108.10.2: icmp_req=2 ttl=64 time=0.191 ms
^C
--- 10.108.10.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.191/0.265/0.339/0.074 ms

root@master:~#curl 10.108.10.2:80/api/nodes
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://10.108.10.2:8000/api/nodes">here</a>.</p>
<hr>
<address>Apache/2.2.15 (CentOS) Server at 10.108.10.2 Port 80</address>
</body></html>

But:

root@master:~# curl 10.108.10.2:8000/api/nodes
curl: (7) couldn't connect to host

Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

Cluster node:

root@master:~# telnet 10.108.10.2 80
Trying 10.108.10.2...
Connected to 10.108.10.2.
Escape character is '^]'.
^X
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://:8000/">here</a>.</p>
<hr>
<address>Apache/2.2.15 (CentOS) Server at 172.17.0.10 Port 80</address>
</body></html>
Connection closed by foreign host.

But

root@master:~# telnet 10.108.10.2 8000
Trying 10.108.10.2...

Master node:

[root@nailgun ~]# telnet 10.108.10.2 8000
Trying 10.108.10.2...
Connected to 10.108.10.2.
Escape character is '^]'.
^X
<html>
<head><title>400 Bad Request</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<hr><center>nginx/1.0.15</center>
</body>
</html>
Connection closed by foreign host.

Revision history for this message
xreuze (xreuze) wrote : Re: [Bug 1340725] [NEW] Nailgun agent execution expired
Download full text (4.2 KiB)

This is 3'd party done by python Git hub..

Should we have support for this in any way.

Let me look at it.. seems to me that they need a kick in lib file..

On Fri, Jul 11, 2014 at 3:40 PM, Nastya Urlapova <email address hidden>
wrote:

> Public bug reported:
>
> {
> build_id: "2014-07-10_00-39-56",
> mirantis: "yes",
> build_number: "112",
> ostf_sha: "09b6bccf7d476771ac859bb3c76c9ebec9da9e1f",
> nailgun_sha: "f5ff82558f99bb6ca7d5e1617eddddf7142fe857",
> production: "docker",
> api: "1.0",
> fuelmain_sha: "293015843304222ead899270449495af91b06aed",
> astute_sha: "5df009e8eab611750309a4c5b5c9b0f7b9d85806",
> release: "5.0.1",
> fuellib_sha: "364dee37435cbdc85d6b814a61f57800b83bf22d"
> }
>
> Deploy simple cluster - Ubuntu, 1xController, 2xComputer, neutron gre
> Deployment of environment is done.
> Snapshot env
> After revert env all nodes are offline.
> Execution nailgun agent failed with error:
> [2014-07-11T12:17:20.200088 #27568] ERROR -- : execution expired
> /usr/lib/ruby/vendor_ruby/httpclient/timeout.rb:43:in
> `new'/usr/lib/ruby/vendor_ruby/httpclient/session.rb:803:in
> `create_socket'/usr/lib/ruby/vendor_ruby/httpclient/session.rb:752:in
> `connect'/usr/lib/ruby/vendor_ruby/httpclient/timeout.rb:131:in
> `timeout'/usr/lib/ruby/vendor_ruby/httpclient/session.rb:751:in
> `connect'/usr/lib/ruby/vendor_ruby/httpclient/session.rb:609:in
> `query'/usr/lib/ruby/vendor_ruby/httpclient/session.rb:164:in
> `query'/usr/lib/ruby/vendor_ruby/httpclient.rb:1083:in
> `do_get_block'/usr/lib/ruby/vendor_ruby/httpclient.rb:887:in
> `do_request'/usr/lib/ruby/vendor_ruby/httpclient.rb:981:in
> `protect_keep_alive_disconnected'/usr/lib/ruby/vendor_ruby/httpclient.rb:886:in
> `do_request'/usr/lib/ruby/vendor_ruby/httpclient.rb:774:in
> `request'/usr/lib/ruby/vendor_ruby/httpclient.rb:689:in
> `put'/opt/nailgun/bin/agent:152:in `put'/opt/nailgun/bin/agent:557
>
> mco ping works fine.
>
> ** Affects: fuel
> Importance: High
> Assignee: Fuel Python Team (fuel-python)
> Status: New
>
> ** Attachment added: "fuel-snapshot-2014-07-11_12-37-54.tgz"
>
> https://bugs.launchpad.net/bugs/1340725/+attachment/4150367/+files/fuel-snapshot-2014-07-11_12-37-54.tgz
>
> --
> You received this bug notification because you are subscribed to
> OpenStack.
> Matching subscriptions: LooterBob
> https://bugs.launchpad.net/bugs/1340725
>
> Title:
> Nailgun agent execution expired
>
> Status in Fuel: OpenStack installer that works:
> New
>
> Bug description:
> {
> build_id: "2014-07-10_00-39-56",
> mirantis: "yes",
> build_number: "112",
> ostf_sha: "09b6bccf7d476771ac859bb3c76c9ebec9da9e1f",
> nailgun_sha: "f5ff82558f99bb6ca7d5e1617eddddf7142fe857",
> production: "docker",
> api: "1.0",
> fuelmain_sha: "293015843304222ead899270449495af91b06aed",
> astute_sha: "5df009e8eab611750309a4c5b5c9b0f7b9d85806",
> release: "5.0.1",
> fuellib_sha: "364dee37435cbdc85d6b814a61f57800b83bf22d"
> }
>
> Deploy simple cluster - Ubuntu, 1xController, 2xComputer, neutron gre
> Deployment of environment is done.
> Snapshot env
> After revert env all nodes are offline.
> Execution nailgun agent failed with error:
> [2014-07-11T12:17:20...

Read more...

Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

Also check iptables rules and found suspicion order, but after iptables restart order changes to accustomed, but problem was not solved. (thanks Matt for help)

Was:

[root@nailgun ~]# iptables -nL --line-numbers | grep 8000
2 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:8000
2 ACCEPT tcp -- 0.0.0.0/0 172.17.0.9 tcp dpt:8000
8 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:8000

Now:

[root@nailgun ~]# iptables -nL --line-numbers | grep 8000
2 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:8000
2 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:8000
4 ACCEPT tcp -- 0.0.0.0/0 172.17.0.9 tcp dpt:8000

Changed in fuel:
status: New → Triaged
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

Guys, the problem was with MASQUERADE rule on the host system (mc2n1-srt). I've saved those rules in /root/iptables.11-07-2014.save file. Here is the problem rule:

72 49641 2978K MASQUERADE tcp -- * * 0.0.0.0/0 10.108.10.2 tcp dpt:8000

So host system was masquerading all the packet addressed to 8000 port to nailgun virtual machine. And all the traffic looked like this on nailgun node (src IP 10.108.10.1):

14:38:41.260267 IP 10.108.10.1.41694 > 10.108.10.2.8000: Flags [P.], seq 1:169, ack 1, win 229, options [nop,nop,TS val 3326444 ecr 14921834], length 168
14:38:41.260283 IP 10.108.10.2.8000 > 10.108.10.1.41694: Flags [.], ack 169, win 122, options [nop,nop,TS val 14921834 ecr 3326444], length 0
14:38:41.262480 IP 10.108.10.2.8000 > 10.108.10.1.41694: Flags [P.], seq 1:870, ack 169, win 122, options [nop,nop,TS val 14921837 ecr 3326444], length 869
14:38:41.262656 IP 10.108.10.1.41694 > 10.108.10.2.8000: Flags [.], ack 870, win 242, options [nop,nop,TS val 3326445 ecr 14921837], length 0
14:38:41.262765 IP 10.108.10.1.41694 > 10.108.10.2.8000: Flags [F.], seq 169, ack 870, win 242, options [nop,nop,TS val 3326445 ecr 14921837], length 0

I've removed that rule (iptables -t nat -D POSTROUTING 72) and now all traffic looks like this and 8000 port is accessible from slaves:

14:46:15.272479 IP 10.108.10.4.59286 > 10.108.10.2.8000: Flags [S], seq 3471309015, win 29200, options [mss 1460,sackOK,TS val 3434378 ecr 0,nop,wscale 7], length 0
14:46:15.289045 IP 10.108.10.2.8000 > 10.108.10.4.59286: Flags [S.], seq 2586277656, ack 3471309016, win 14480, options [mss 1460,sackOK,TS val 15375863 ecr 3434378,nop,wscale 7], length 0
14:42:55.585269 IP 10.108.10.5.41699 > 10.108.10.2.8000: Flags [S], seq 1437031390, win 29200, options [mss 1460,sackOK,TS val 3390026 ecr 0,nop,wscale 7], length 0
14:42:55.622011 IP 10.108.10.2.8000 > 10.108.10.5.41699: Flags [S.], seq 1512171767, ack 1437031391, win 14480, options [mss 1460,sackOK,TS val 15176196 ecr 3390026,nop,wscale 7], length 0

Please be more careful with NAT rules on the host system.

Dima Shulyak (dshulyak)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Fuel DevOps (fuel-devops)
Changed in fuel:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.