MAAS not sending correct metadata_url

Bug #1982315 reported by Romain
40
This bug affects 7 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Medium
Alberto Donato
3.2
Fix Released
Critical
Alberto Donato

Bug Description

Hello,

I tried to install a new region controller and decided to install the latest version of MAAS (3.2). I tried to commission a node it didnt work! I got this message:

cloudinit.sources.DataSourceNotFoundException() Did not find any data source.

So I did a network capture and figured out the metadata url was wrong:

in 3.2: metadata_url: http://10-246-72-0--24.maas-internal:5248/MAAS/metadata/\n

So I removed MAAS 3.2, did install 3.0 and got the correct metadata_url:

3.0: metadata_url: http://10.246.72.254:5248/MAAS/metadata/\n

Then I tried to install 3.0 then upgrade to 3.2 (not directly install 3.2) and got the same problem. Do you think the problem could fix itself by using FQDN instead of IP address?

Best regards,

Related branches

Bill Wear (billwear)
Changed in maas:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Graham Jones (grahamjwar) wrote :

Are you using the MAAS server for DNS resolution on the nodes you are commissioning? As I believe it should be able to resolve that metadata_url.

Revision history for this message
Romain (romain-chanu) wrote :

MAAS is not used as resolver. I always used external DNS and first time I face this issue.

Revision history for this message
Graham Jones (grahamjwar) wrote :

I also used an external DNS until I encountered this problem when switching to the new release of MAAS. When I switched to using the MAAS DNS I was able to commission nodes again normally.

Revision history for this message
Romain (romain-chanu) wrote :

Hello,

I did update the dns-server option in my DHCP to MAAS IP address and commissioning works again. It's a good workaround if someone really needs a 3.2 feature but currently I prefer to use 3.0 and being able to use my own DHCP server.

Best regards

Revision history for this message
Martin Vyšohlíd (kamik) wrote :

I am facing similar problem with MAAS version 3.2. My setup is using MAAS servers for both DHCP and DNS services. In this case the problem is, that the MAAS DNS has no record for segments which are Relayed via another fabric.

2022-08-03 15:08:03,149 - url_helper.py[ERROR]: Timed out, no response from urls: ['http://10-31-11-0--24.maas-internal:5248/MAAS/metadata/2012-03-01/meta-data/instance-id']
2022-08-03 15:08:03,149 - DataSourceMAAS.py[CRITICAL]: Giving up on md from ['http://10-31-11-0--24.maas-internal:5248/MAAS/metadata/2012-03-01/meta-data/instance-id'] after 126 seconds

name works fine for the direct segment:
root@ubuntu:/media/root-rw/overlay# host 10-31-11-0--24.maas-internal
Host 10-31-11-0--24.maas-internal not found: 3(NXDOMAIN)

fabric-5-prz-intersys-mgmt
root@ubuntu:/media/root-rw/overlay# host 172-17-36-0--22.maas-internal
172-17-36-0--22.maas-internal has address 172.17.36.54
172-17-36-0--22.maas-internal has address 172.17.36.53

Revision history for this message
Nicholas Fries (nicfries) wrote (last edit ):

This bug is also affecting us. We use our own external DHCP and DNS.

tags: added: bug-council
Revision history for this message
Adam Collard (adam-collard) wrote :

Let's understand why we changed from IP to hostname, and if that reasoning holds and we don't want to revert then at least document what folks need to do with DNS/DHCP configuration on their setup.

tags: removed: bug-council
Changed in maas:
milestone: none → 3.3.0
Revision history for this message
Alberto Donato (ack) wrote :

Could someone that's experiencing the issue please paste the output of

maas $profile subnet read $subnet_id

for the subnet the rackcontroller IP belongs to?

Revision history for this message
Martin Vyšohlíd (kamik) wrote :

This is output for one of the Relayed segments, for which DNS names are not generated:
root@maas-rack-controller-01:~# maas admuser subnet read 4
Success.
Machine-readable output follows:
{
    "name": "10.31.11.0/24",
    "description": "",
    "vlan": {
        "vid": 0,
        "mtu": 1500,
        "dhcp_on": false,
        "external_dhcp": null,
        "relay_vlan": {
            "vid": 0,
            "mtu": 1500,
            "dhcp_on": true,
            "external_dhcp": null,
            "relay_vlan": null,
            "secondary_rack": "4wdnke",
            "fabric_id": 0,
            "space": "MAAS-Interconnect",
            "name": "untagged",
            "primary_rack": "qmb8dd",
            "id": 5001,
            "fabric": "fabric-0",
            "resource_uri": "/MAAS/api/2.0/vlans/5001/"
        },
        "secondary_rack": null,
        "fabric_id": 4,
        "space": "LAB-default",
        "name": "untagged",
        "primary_rack": null,
        "id": 5006,
        "fabric": "fabric-4-lab-default",
        "resource_uri": "/MAAS/api/2.0/vlans/5006/"
    },
    "cidr": "10.31.11.0/24",
    "rdns_mode": 2,
    "gateway_ip": "10.31.11.1",
    "dns_servers": [
        "10.31.255.31",
        "10.31.255.32"
    ],
    "allow_dns": true,
    "allow_proxy": false,
    "active_discovery": true,
    "managed": true,
    "disabled_boot_architectures": [],
    "space": "LAB-default",
    "id": 4,
    "resource_uri": "/MAAS/api/2.0/subnets/4/"
}
root@maas-rack-controller-01:~#

Alberto Donato (ack)
Changed in maas:
status: Triaged → In Progress
assignee: nobody → Alberto Donato (ack)
Changed in maas:
status: In Progress → Fix Committed
Revision history for this message
Ventsislav Georgiev (vgeorgiev) wrote :

I'm also facing the same issue. This works fine for existing MaaS hosts but not new ones. I'm also using an external DNS and it has always been working fine.

Revision history for this message
Ventsislav Georgiev (vgeorgiev) wrote :

Hey there, any update on this? Or an ETA of when MaaS 3.2.5 will be released? I tried using the solution from #4 but it didn't work for me.

Revision history for this message
Björn Tillenius (bjornt) wrote :

We're in the process of releasing MAAS 3.2.5. There are builds up in ppa:maas/3.2-next, as well as in the 3.2/beta snap channel.

We're currently testing the builds and will release them when they've passed all tests.

Changed in maas:
status: Fix Committed → Fix Released
status: Fix Released → Fix Committed
Revision history for this message
Marek Grudzinski (ivve) wrote :

Hello there,

This fix does not seem to work if the network is relayed. I tried manually applying the change and also tried downloading the package from 3.2-next just to be sure. It doesn't seem to matter if we are using an external DNS or the MAAS one. Now when the deployed machine tries to get the metadata, it tries to get it from itself. I.e the deployed machine gets 192.168.1.10 it tries to fetch http://192.168.1.10:5248/MAAS/metadata... etc instead of fetching the metadata from the maas server IP or DNS.

Any comment on this?

Revision history for this message
Ventsislav Georgiev (vgeorgiev) wrote :

Hello,

This also doesn't work for us. I updated to the latest version available (3.2.5) and the problem persists for machine "outside" of MaaS. As mentioned above, we are also using external DNS and before we never had issues.

On the current version (23696) when I do a cat of the enlist i get only this:
cat /snap/maas/current/etc/maas/preseeds/enlist
{{preseed_data}}

On the previous (23425) version i get the following:
cat /snap/maas/23425/etc/maas/preseeds/enlist
{{preseed_data}}

datasource:
  MAAS:
    metadata_url: x.x.x.x:5240/MAAS/metadata

We are using the following workaround on the MaaS host as a fix at the moment:

# mount -o ro,bind /root/enlist /snap/maas/current/etc/maas/preseeds/enlist
# umount /var/lib/snapd/snap/maas/23425/etc/maas/preseeds/enlist
# mount |grep enl

/dev/md0p2 on /var/lib/snapd/snap/maas/23696/etc/maas/preseeds/enlist type ext4 (ro,relatime,init_itable=0)

I hope this info helps.

Revision history for this message
Michael Klippberg (klippo) wrote (last edit ):

#13 worked as a workaround for us as well

I also noticed that the problem only occurs on :5248.

Port 5240 returns maas-ip
Port 5248 returns my client ip

 curl -s "http://maas-ip:5248/MAAS/metadata/latest/enlist-preseed/?op=get_enlist_preseed" | grep datasource -A2
datasource:
  MAAS:
    metadata_url: http://client-ip:5248/MAAS/metadata/

$ curl -s "http://maas-ip:5240/MAAS/metadata/latest/enlist-preseed/?op=get_enlist_preseed" | grep datasource -A2
datasource:
  MAAS:
    metadata_url: http://maas-ip:5240/MAAS/metadata/

Edit:

This combination seems be part of the problem

https://github.com/maas/maas/blame/3.2/src/maasserver/compose_preseed.py#L47
and
https://github.com/maas/maas/blob/3.2/src/provisioningserver/templates/http/rackd.nginx.conf.template#L20

If I comment out that line from rackd.nginx.conf I receive maas url instead of client-ip

Revision history for this message
Andy del Hierro (adelhierro) wrote :

Just to add how I got around the problem was to revert snap version to 3.2.4 I saw the problem with 3.2.5

It also spiked my CPU and crashed my instance of RegionD. I do have a case open.

Revision history for this message
Björn Tillenius (bjornt) wrote :

It seems like the fix for this didn't quite work. I've filed bug #1989970, so that we can keep track of the fix.

Revision history for this message
Alan Baghumian (alanbach) wrote (last edit ):

We tested 3.2.5 today at a client location that is affected by this issue. It still has not been resolved. We are unable to enlist any new nodes hitting the no datasource found issue.

Something interesting we noticed (screenshot attached) was that the machine was trying to reach it's own IP address for the datasource. Something we can't explain why????

Revision history for this message
Alan Baghumian (alanbach) wrote :

I was able to reproduce this in my home lab MAAS also.

Steps:

1. Created a new KVM based virtual machine and initiated a PXE boot.
2. The VM started booting and received an IP address of 10.1.8.134.
3. The enlisting got stuck with the machine trying to download meta data from 10.1.8.134 vs. the actual MAAS endpoints.

Please see the attached video.

Changed in maas:
milestone: 3.3.0 → 3.3.0-beta1
Changed in maas:
status: Fix Committed → Fix Released
Revision history for this message
Rikimaru Honjo (honjo-rikimaru-c6) wrote (last edit ):

I met a similar issue in MAAS 3.2.6/stable, 3.3.0-beta3/beta. The root cause is that I hadn’t allowed connections on the DNS port(53) between the region and rack.

The DNS port is not written in "How to set up a firewall for MAAS" page: https://maas.io/docs/how-to-secure-maas

I wrote the detail in the following page:

https://discourse.maas.io/t/commissioning-failure-in-1-region-2-rack-environment

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.