MAAS enlistment fails when region is behind a NAT

Bug #1743005 reported by Scott Hussey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Unassigned
2.3
Fix Released
High
Unassigned

Bug Description

MaaS 2.3 no longer allows explicit configuration of the metadata_url used for cloud-init in bootstrapping nodes. When running regiond in Kubernetes behind a NodePort, the response to get_enlist_preseed is not accessible and not even reasonable.

Introduced by: https://code.launchpad.net/~mpontillo/maas/+git/maas/+ref/better-default-maas-url--bug-1418044

Region routing info:
root@maas-region-0:/# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1
    link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if54: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
    link/ether 36:dd:0a:ce:cf:70 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.97.166.173/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::34dd:aff:fece:cf70/64 scope link
       valid_lft forever preferred_lft forever
root@maas-region-0:/# ip route
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0 scope link
root@maas-region-0:/# ip route get 172.24.1.1
172.24.1.1 via 169.254.1.1 dev eth0 src 10.97.166.173
    cache
root@maas-region-0:/# cat /etc/maas/regiond.conf
database_host: postgresql.ucp.svc.cluster.local
database_name: maasdb
database_pass: ########
database_user: maas
maas_url: http://172.24.1.100:31900/MAAS

Response to get_enlist_preseed
ubuntu@cab23-r720-16:~$ curl -v http://172.24.1.100:31900/MAAS/metadata/latest/enlist-preseed/?op=get_enlist_preseed
* Trying 172.24.1.100...
* Connected to 172.24.1.100 (172.24.1.100) port 31900 (#0)
> GET /MAAS/metadata/latest/enlist-preseed/?op=get_enlist_preseed HTTP/1.1
> Host: 172.24.1.100:31900
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Fri, 12 Jan 2018 15:48:09 GMT
< Server: TwistedWeb/16.0.0
< X-Maas-Api-Hash: 42eb95deab0d98e655f9f09beb5c9b1d97beb10f
< Vary: Authorization,Accept-Encoding
< Content-Type: text/plain
< X-Frame-Options: SAMEORIGIN
< Transfer-Encoding: chunked
<
#cloud-config
datasource:
  MAAS:
    timeout : 50
    max_wait : 120
    # there are no default values for metadata_url or oauth credentials
    # If no credentials are present, non-authed attempts will be made.
    metadata_url: http://[::1]:31900/MAAS/metadata/enlist

output: {all: '| tee -a /var/log/cloud-init-output.log'}
* Connection #0 to host 172.24.1.100 left intact
ubuntu@cab23-r720-16:~$ ip route get 172.24.1.100
172.24.1.100 dev pxe1-if src 172.24.1.1
    cache

root@maas-region-0:/# dpkg -l '*maas*' | cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===============================-====================================-============-=============================================
un maas <none> <none> (no description available)
ii maas-cli 2.3.0-6434-gd354690-0ubuntu1~16.04.1 all MAAS client and command-line interface
ii maas-common 2.3.0-6434-gd354690-0ubuntu1~16.04.1 all MAAS server common files
ii maas-dns 2.3.0-6434-gd354690-0ubuntu1~16.04.1 all MAAS DNS server
ii maas-proxy 2.3.0-6434-gd354690-0ubuntu1~16.04.1 all MAAS Caching Proxy
ii maas-region-api 2.3.0-6434-gd354690-0ubuntu1~16.04.1 all Region controller API service for MAAS
ii maas-region-controller 2.3.0-6434-gd354690-0ubuntu1~16.04.1 all Region Controller for MAAS
un maas-region-controller-min <none> <none> (no description available)
un python-django-maas <none> <none> (no description available)
un python-maas-client <none> <none> (no description available)
ii python3-django-maas 2.3.0-6434-gd354690-0ubuntu1~16.04.1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 2.3.0-6434-gd354690-0ubuntu1~16.04.1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 2.3.0-6434-gd354690-0ubuntu1~16.04.1 all MAAS server provisioning libraries (Python 3)

Related branches

Revision history for this message
Scott Hussey (sh8121) wrote :
Changed in maas:
status: New → Incomplete
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Scott,

IIRC, enlistment would always use the IP address of rackd.conf maas_url field. If this one was set to localhost, MAAS would use the IP of maas_url on regiond.conf. This typically means both region and rack are in the same machine.

The bugfix you referenced, should fix situations which rackd.conf is set as localhost but we know that the machine pxe booted from an IP, and instead of blindly giving the IP of maas_url on reiond.conf, we give the right IP address of the region controller. That said, if rackd.conf has a proper maas_url, it will use that instead.

Now, if you hit the metadata directly with curl it may not give you the information you are looking for. So, to better debut this, can you also attach:

1. Rackd.conf
2. A console log when the machine is pxe booting of MAAS (which would give the kernel parameters) or the kernel log of the ephemeral environment which should also contain the Params.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

Can you be more specific about what you mean by "no longer allows explicit configuration of the metadata_url used for cloud-init"? Where were you configuring this before?

The branch you referenced fixed a bug that caused incorrect source address selection, but it does not prevent a MAAS admin for defining the `maas_url` in the rackd.conf file; rather, it intentionally preserves that behavior.

That is, you should be able to edit `/etc/maas/rackd.conf` and define `maas_url: <your-desired-url>`, and the region will use that URL when hosts boot from that rack controller. (If you have done that and it still doesn't work, then this bug is valid and should be fixed, but it's not clear to me from the bug description.)

In `src/maasserver/server_address.py` you can see that `get_maas_facing_server_host(...)` checks if `rack_controller.url` is defined and uses that as a first-priority rather than the new `default_region_ip` determined by source address selection.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

By the way, a couple other points:

 - You can also configure the URL by running `sudo dpkg-reconfigure -plow maas-rack-controller`.
 - You must restart the maas-rackd service after configuring the URL for the region to recognize the URL. (If you use `dpkg-reconfigure`, this should happen automatically.)

Revision history for this message
Scott Hussey (sh8121) wrote :

This issue separate from anything in rackd as far as I can tell. The initial URL given to cloud-init is correct and used for the 'get_enlist_preseed' operation. In the response to this call the metadata server provides another url as 'metadata_url' and this is where the issue lies. We previously (MaaS 2.2) were specifying this by shoving in an explicit IP-based URL in /etc/maas/regiond.conf:maas_url. That value is now ignored in favor of the results of inspecting the route table. I can look into the dpkg-reconfigure option at some point.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

@Scott,

For us to be able to better debug this issue, can you please share your rackd.conf or confirm the value /etc/maas/rackd.conf:maas_url of the rack controller the machine is pxe booting from? Again, if it is set to localhost it a single region/rack controller, and this is the only time MAAS should fallback to the value of /etc/maas/regiond.conf:maas_url.

There should not be any situation in which a split rack controller that points to the IP of the region, would fallback to the IP /etc/maas/regiond.conf:maas_url, provided that MAAS would give the IP of the region specified in /etc/maas/rackd.conf:maas_url.

Known how the rack controller is configured will let us determine where the issue could be, because what I'm thinking is happening, is that when the node access the metadata, it doesn't really know which rack it pxe booted from and tries to obtain the best address instead of falling back to /etc/maas/regiond.conf:maas_url. However, depending on how the rack is configure is that we see such behavior.

Revision history for this message
Scott Hussey (sh8121) wrote :

It is the same as regiond.conf

root@node1:/# cat /etc/maas/rackd.conf
cluster_uuid: 422cc156-d4e5-43a6-b200-2507c3e7ec37
maas_url: http://172.24.1.100:31900/MAAS

Revision history for this message
Scott Hussey (sh8121) wrote :

This patch is our current workaround and solves the problem in our environment:

https://review.gerrithub.io/#/c/394591/3/images/maas-region-controller/2.3_nat_fix.patch

Changed in maas:
status: Incomplete → Triaged
importance: Undecided → High
milestone: none → 2.3.x
milestone: 2.3.x → 2.4.0alpha1
Revision history for this message
Mike Pontillo (mpontillo) wrote :

Thanks for posting your workaround for this. I'll think about if we can do a more general fix for this. Right now I'm leaning toward a configuration option, but we'll try to avoid that if possible. The original fix is a great help for most customers setting up MAAS, since the default URL chosen for the configuration file is arbitrary and often incorrect or unreachable.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

After looking closer at this, it seems the best way to fix this may be to make use of the 'Host' header. That is, if the HTTP request included a 'Host' header, then rather than trying to guess what the best IP address is, we already know how the client reached the metadata server. This way, the URL stays consistent no matter what.

Revision history for this message
Scott Hussey (sh8121) wrote :

@mpontillo I believe the Host header solution would work in our environment, good idea.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

I've got a branch in-progress that should fix this in the correct way; if you'd like to test it early, I've posted it here:

https://code.launchpad.net/~mpontillo/maas/+git/maas/+merge/336152

Unit test updates are necessary (and code review/approvals) are required before it can land and be backported.

Revision history for this message
Scott Hussey (sh8121) wrote :

I'll try to get a test with this. I'm tracing some code around building the APT proxy URL and not sure if it might suffer some of the same issues. Unrelated, but would be nice to make the maas-proxy port configurable.

Changed in maas:
status: Triaged → Fix Committed
Revision history for this message
Mike Pontillo (mpontillo) wrote :

Were you able to get a test run with this fix? We've landed it now, so I hope this will work for you.

Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.