Requests to meta-data service do not timeout

Bug #1218651 reported by Andrew Glen-Young
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Expired
Wishlist
Unassigned
juju-core
Won't Fix
Medium
Unassigned

Bug Description

Description:

Requests made by juju to the Openstack metadata service do not timeout (or take too long to timeout) if the service stops responding. This eventually results in a DOS of this service.

How we discovered the issue:

We encountered a problem with Openstack (Folsom) metadata service not responding to requests. On closer inspection, we noticed that hundreds of TCP connections to the metadata service were "stuck" in the ESTABLISHED state causing the service not to respond to clients; effectively DOS'ing the service. After tracing the connections we noticed that they all originated from juju-core deployed instances. These instances all had multiple connections to the metadata service which seems unusual given that the service is intended for short-lived HTTP requests for simple instance metadata.

Conclusions and assumptions:

Without doubt the nova-api-metadata service has a bug and shouldn't maintain connections for so long, however juju should timeout connections if they don't respond after a period of time. We can't expect all services to be good citizens.

A quick search reveals that the lack of connection (and other) timeouts seem to be a common problem with the Go net http library.

System and other information:

I tried to capture relevant information before restarting the nova-api-metadata service.

ubuntu@juju-instance:~$ sudo netstat -anp
    [...]
tcp 0 0 10.33.16.66:43086 169.254.169.254:80 ESTABLISHED 9467/jujud
tcp 0 0 10.33.16.66:43112 169.254.169.254:80 ESTABLISHED 15419/jujud
tcp 0 0 10.33.16.66:43120 169.254.169.254:80 ESTABLISHED 16558/jujud
tcp 0 0 10.33.16.66:43126 169.254.169.254:80 ESTABLISHED 17071/jujud
    [...]

ubuntu@juju-instance:~$ /var/lib/juju/tools/unit-ciaas-landscape-client-0/jujud --version
1.13.2.1-precise-amd64

root@meta-data-server:~# lsof -p 18798 | grep -cE TCP.*8775 # pid 18798 is the nova-api-metadata service
1002

root@meta-data-server:~# lsof -np 18798 | egrep 8775 | egrep -o 10.33.[0-9.]+ | sort | uniq -c | sort -n | tail # number of connections by IP
     23 10.33.16.241
     23 10.33.16.63
     24 10.33.16.231
     24 10.33.16.233
     24 10.33.16.234
     24 10.33.16.235
     24 10.33.16.236
     24 10.33.16.66
     25 10.33.16.120
     30 10.33.16.128

Please let me know if you need futher information?

Ian Booth (wallyworld)
Changed in juju-core:
importance: Undecided → High
status: New → Triaged
Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1218651] Re: Requests to meta-data service do not timeout

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2013-08-30 5:02, Ian Booth wrote:
> ** Changed in: juju-core Importance: Undecided => High
>
> ** Changed in: juju-core Status: New => Triaged
>

Unfortunately, the standard go idiom for handling a connection that
might block is:

go readFromSocketOntoChannel(sock, fromSockChan)
select {
  case data := <-fromSockChan:
    do stuff
  case <-time.After(timeout):
    do other stuff
}

AFAIK that means the readFromSockOntoChannel never times out and stops
reading from the channel, we just stop waiting for the data to come
out of it.

Now it is possible that we should be doing:

 case <-time.After(timeout):
  sock.Close() // We didn't hear back in time
  // do other stuff

Or some other mechanism.

I don't see an explicit way to SetTimeout, but you do have SetDeadline
and SetNoDelay
http://golang.org/pkg/net/#TCPConn.SetDeadline
The main problem with SetDeadline is that you have to continually
refresh it.
http://golang.org/pkg/net/#Conn

Note that to implement timeout directly we have to wrap
net/http/Transport (which is a RoundTripper) and then wrap Dial to
return a net.Conn. We would then need to have a custom net.Conn that
did some sort of SetDeadline before every Read or Write call.

Client does seem to expose its Transport object, but Transport doesn't
expose its underlying connection cache for us to call Conn.SetDeadline
ourselves.

http://stackoverflow.com/questions/16895294/how-to-set-timeout-for-http-get-requests-in-golang

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (Cygwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIjLcYACgkQJdeBCYSNAAO6QQCggqeB11dGuBLTrNL8Bjcjaajv
rcUAn0bZ/B+WHlV4ZLEXJKun3L2gbxSx
=J0L6
-----END PGP SIGNATURE-----

Curtis Hovey (sinzui)
tags: added: canonistack
removed: prodstack
Curtis Hovey (sinzui)
tags: added: preformance security
Curtis Hovey (sinzui)
tags: added: performance
removed: preformance
Changed in juju-core:
importance: High → Medium
Revision history for this message
Anastasia (anastasia-macmood) wrote :

Re-targeting to Juju 2.x

Changed in juju:
status: New → Triaged
importance: Undecided → Wishlist
Changed in juju-core:
status: Triaged → Won't Fix
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 5 years, so we're marking it Expired. If you believe this is incorrect, please update the status.

Changed in juju:
status: Triaged → Expired
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.