[2.1.1] regiond requests to 127.0.0.1:5240 going via external proxy and failing

Bug #1643403 reported by Trent Lloyd
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
MAAS
Invalid
High
Unassigned
2.1
Won't Fix
High
Unassigned

Bug Description

I upgraded my 2.0 installation to 2.1.1. I noticed requests failing in the regiond log, and that these requests to 127.0.0.1 were being sent to my external APT proxy.

/etc/maas/rackd.conf
maas_url: http://localhost:5240/MAAS

/etc/maas/regiond.conf
maas_url: http://100.64.0.253/MAAS

Error from /var/log/maas/regiond.log
2016-11-21 10:41:10 stderr: [error] request to http://127.0.0.1:5240/MAAS/metadata/status/4y3h7n/latest failed. sleeping 1.: HTTP Error 503: Service Unavailable

From proxy log:
1479695886.079 1 100.64.0.253 TCP_MISS/503 4653 POST http://127.0.0.1:5240/MAAS/metadata/status/4y3h7n/latest - HIER_DIRECT/127.0.0.1 text/html

A few interesting notes:
 - The localhost config is stored in the rackd config file
 - The errors were being generated by regiond, seemingly connecting to itself based on the rackd config
 - If I change the rackd.conf URL from localhost to 100.64.0.253 then not only do the errors go away, but the requests are no longer sent via the proxy
 - Disabling the proxy also 'fixed' the problem

Seems there is a bug in the logic for when to use the proxy, seems like it has an 'exception' for it's own IPs but does not include localhost. If fixing this by whitelist, should probably whitelist ::1 also.

Changed in maas:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 2.2.0
importance: Critical → High
tags: added: docteam
Revision history for this message
Mike Pontillo (mpontillo) wrote :

I am not sure why the region seems to be making HTTP calls to itself in this example. That's strange to me and I think it requires explanation. (The only thing I can think of is, I think we run a subset of the commissioning scripts on the region in order to collect data to populate the database; perhaps one of those scripts is doing it.)

I think the "TCP_MISS/503" in the proxy log simply means that the proxy tried to access the server, but it returned an HTTP 503 (unavailable) error. That seems like it should be temporary. Is this consistent?

The other thing I would note here is (at least in trunk), we have a rule as follows in the maas-proxy.conf.template:

    acl localnet src 127.0.0.0/8

So the requests are indeed being allowed by the proxy. I think the issue, then, is with the HTTP server answering the request on that port, not the proxy. But it's still weird that the requests are going through the proxy.

This is a long shot, but the other thing I would check in this situation are the iptables rules. I have seen situations where broad NAT rules cause the system to communicate to itself in weird ways (such as SNAT-ing traffic toward localhost).

Revision history for this message
Mike Pontillo (mpontillo) wrote :

Ah, sorry, I went off on the wrong track on this one. I thought you meant the MAAS (squid) proxy, not the external apt proxy.

In that case, I would try to find out if all the proxy configuration variables are being properly set.

export no_proxy="localhost,127.0.0.1,::1,<local-address-1[,...]>"

I also notice that in MAAS, we add all several variations of localhost (and probably the region's own IPs) in 'src/provisioningserver/rpc/boot_images.py'. But it looks like that code is only used for importing the boot images.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

Taking a look at the usages of http_proxy in the MAAS code, I think it must be enlistment. From 'src/maasserver/preseed.py':

def get_enlist_userdata(rack_controller=None):
    """Return the enlistment preseed.

    :param rack_controller: The rack controller used to generate the preseed.
    :return: The rendered enlistment user-data string.
    :rtype: unicode.
    """
    server_host = get_maas_facing_server_host(rack_controller=rack_controller)
    http_proxy = None
    if Config.objects.get_config('enable_http_proxy'):
        if Config.objects.get_config('http_proxy'):
            http_proxy = Config.objects.get_config('http_proxy')
        elif server_host:
            http_proxy = "http://%s:8000/" % server_host
    enlist_userdata = render_enlistment_preseed(
        USERDATA_TYPE.ENLIST, rack_controller=rack_controller)
    config = get_system_info()
    config.update({'apt_proxy': http_proxy})
    config.update(get_enlist_archive_config(http_proxy))
    return enlist_userdata + yaml.safe_dump(config).encode('utf-8')

Not sure why an enlistment request is coming through on 127.0.0.1. (This might be a cloud-init problem, because I don't see where the HTTP proxy environment variables are used in any MAAS commissioning scripts... but why is cloud-init running on 127.0.0.1? Is there some NAT involved here?)

The only other direct usage of the proxies I see in the MAAS code is where we grab SSH keys from Launchpad or GitHub, and given the URL I don't think that's the issue here. (Vut we don't pass no_proxy in when we make the request, so that's a possible -- however unlikely -- bug.)

Revision history for this message
Mike Pontillo (mpontillo) wrote :

All right. After talking through this, I think I understand the bug now.

(1) MAAS passes in the HTTP proxy to cloud-init.

(2) cloud-init tries to use the HTTP proxy for enlistment, because we don't pass any proxy exceptions to cloud-init.[1]

(3) The admin has 'localhost' in their rackd.conf for the MAAS URL, and the enlistment request attempts to use that as a URL to connect to MAAS with.[2]

(4) Enlistment fails because the external HTTP proxy is trying and failing to connect to itself.

Does this theory sound like what might be happening here?

--

[1]: Also, I don't know if it's possible to pass no_proxy rules to cloud-init. In addition, assuming we could do that, I don't know how MAAS would calculate those rules; possibly the same way we calculate access to the internal squid proxy?

[2]: I thought that was supposed to be dynamically replaced with the best region IP, but maybe that isn't happening.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

One question: which node does the 4y3h7n system_id belong to in this example? (or whatever other system_id you are seeing in the log currently, if that setup is gone now)

Revision history for this message
Andreas Hasenack (ahasenack) wrote : Re: [Bug 1643403] Re: [2.1.1] regiond requests to 127.0.0.1:5240 going via external proxy and failing

I have this happening right now with 2.1.3 using an external proxy, I can
get you all the details tomorrow since I'm EOD today.

On Jan 5, 2017 8:11 PM, "Mike Pontillo" <email address hidden> wrote:

> One question: which node does the 4y3h7n system_id belong to in this
> example? (or whatever other system_id you are seeing in the log
> currently, if that setup is gone now)
>
> --
> You received this bug notification because you are a member of
> Landscape, which is subscribed to the bug report.
> https://bugs.launchpad.net/bugs/1643403
>
> Title:
> [2.1.1] regiond requests to 127.0.0.1:5240 going via external proxy
> and failing
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1643403/+subscriptions
>

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I don't see it happening anymore, it's not entirely clear what I changed.

From the logs, I can see I updated resolv.conf to use MAAS's own DNS server, and then the errors stopped:
regiond.log:
2017-01-05 13:14:55 stderr: [error] request to http://127.0.0.1:5240/MAAS/metadata/status/4y3h7n/latest failed. sleeping 32.: HTTP Error 503: Service Unavailable
2017-01-05 13:15:11 twisted.python.log: [info] ::1 - - [05/Jan/2017:18:15:11 +0000] "GET /MAAS/rpc/ HTTP/1.0" 200 196 "-" "provisioningserver.rpc.clusterservice.ClusterC
lientService"
2017-01-05 13:15:38 -: [info] b'/etc/resolv.conf' changed, reparsing
2017-01-05 13:15:38 -: [info] Resolver added ('10.245.200.25', 53) to server list
2017-01-05 13:15:38 -: [info] Resolver added ('10.245.200.1', 53) to server list

squid log:05/Jan/2017:18:14:31 +0000 0 10.245.200.25 TCP_MISS/503 4540 POST http://127.0.0.1:5240/MAAS/metadata/status/4y3h7n/latest - HIER_DIRECT/127.0.0.1 text/html
05/Jan/2017:18:14:39 +0000 0 10.245.200.25 TCP_MISS/503 4539 POST http://127.0.0.1:5240/MAAS/metadata/status/4y3h7n/latest - HIER_DIRECT/127.0.0.1 text/html
05/Jan/2017:18:14:55 +0000 0 10.245.200.25 TCP_MISS/503 4539 POST http://127.0.0.1:5240/MAAS/metadata/status/4y3h7n/latest - HIER_DIRECT/127.0.0.1 text/html

That was the last attempt to reach 127.0.0.1 via the proxy.

tags: added: landscape
Changed in maas:
milestone: 2.2.0 → 2.2.x
Changed in maas:
milestone: 2.2.x → 2.3.x
Revision history for this message
Adam Collard (adam-collard) wrote :

This bug has not seen any activity in the last 6 months, so it is being automatically closed.

If you are still experiencing this issue, please feel free to re-open.

MAAS Team

Changed in maas:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.