Wrong metadata url in enlist cloud-config

Bug #2022926 reported by Michal Kielkowski
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Anton Troyanov
3.2
Fix Released
High
Anton Troyanov
3.3
Fix Released
High
Anton Troyanov
3.4
Fix Released
High
Anton Troyanov

Bug Description

Hello,

this is single host, rack+region scenario. Version 3.3.3. Metadata url is presented without a specific port and it's unreachable on default 80.

Steps to reproduce:

When visiting url passed on to kernel/initrd by bootloader:

http://192.168.0.234:5248/MAAS/metadata/latest/enlist-preseed/?op=get_enlist_preseed

I receive following cloud-config script

#cloud-config
apt:
  preserve_sources_list: false
  primary:
  - arches:
    - amd64
    - i386
    uri: http://archive.ubuntu.com/ubuntu
  - arches:
    - default
    uri: http://ports.ubuntu.com/ubuntu-ports
  proxy: http://192.168.0.234:8000/
  security:
  - arches:
    - amd64
    - i386
    uri: http://archive.ubuntu.com/ubuntu
  - arches:
    - default
    uri: http://ports.ubuntu.com/ubuntu-ports
  sources_list: 'deb $PRIMARY $RELEASE universe restricted main multiverse

    # deb-src $PRIMARY $RELEASE universe restricted main multiverse

    deb $PRIMARY $RELEASE-updates universe restricted main multiverse

    # deb-src $PRIMARY $RELEASE-updates universe restricted main multiverse

    deb $PRIMARY $RELEASE-backports universe restricted main multiverse

    # deb-src $PRIMARY $RELEASE-backports universe restricted main multiverse

    deb $SECURITY $RELEASE-security universe restricted main multiverse

    # deb-src $SECURITY $RELEASE-security universe restricted main multiverse

    '
datasource:
  MAAS:
    metadata_url: http://192.168.0.234/MAAS/metadata/
manage_etc_hosts: true
packages:
- python3-yaml
- python3-oauthlib
power_state:
  condition: test ! -e /tmp/block-poweroff
  delay: now
  mode: poweroff
  timeout: 1800
rsyslog:
  remotes:
    maas: 192.168.0.234:5247

Please note metadat_url not having a port specified. The enlistment process hangs during ephemeral image startup after

[ OK ] Reached target Host and Netowrka Name Lookups

and then typically times out but with no datasource configured and resulting in failed enlistment.

You can work it around by replacing default /etc/maas/preseed/enlist template:

{{preseed_data}}

with

#cloud-config
apt:
  preserve_sources_list: false
  primary:
  - arches:
    - amd64
    - i386
    uri: http://archive.ubuntu.com/ubuntu
  - arches:
    - default
    uri: http://ports.ubuntu.com/ubuntu-ports
  security:
  - arches:
    - amd64
    - i386
    uri: http://archive.ubuntu.com/ubuntu
  - arches:
    - default
    uri: http://ports.ubuntu.com/ubuntu-ports
  sources_list: 'deb $PRIMARY $RELEASE main restricted multiverse universe

    # deb-src $PRIMARY $RELEASE main restricted multiverse universe

    deb $PRIMARY $RELEASE-updates main restricted multiverse universe

    # deb-src $PRIMARY $RELEASE-updates main restricted multiverse universe

    deb $PRIMARY $RELEASE-backports main restricted multiverse universe

    # deb-src $PRIMARY $RELEASE-backports main restricted multiverse universe

    deb $SECURITY $RELEASE-security main restricted multiverse universe

    # deb-src $SECURITY $RELEASE-security main restricted multiverse universe

    '
datasource:
  MAAS:
    metadata_url: http://10.141.200.9:5248/MAAS/metadata/
manage_etc_hosts: true
packages:
- python3-yaml
- python3-oauthlib
power_state:
  condition: test ! -e /tmp/block-poweroff
  delay: now
  mode: poweroff
  timeout: 1800
rsyslog:
  remotes:
    maas: 10.141.200.9:5247

Commisioning and deployment are not affected.
With regards
Michal K.

Related branches

Revision history for this message
Anton Troyanov (troyanov) wrote :

Hello Michal!

Do you have MAAS running behind a loadbalancer or reverse proxy?

Or maybe your machine has multiple interfaces?

---
FTR
The function that templates this config starts here: src/maasserver/compose_preseed.py:661

def build_metadata_url(request, route, rack_controller, node=None, extra=""):
    host = _get_rackcontroller_host(request, node=node)
    if host is None and rack_controller is not None:
        host = rack_controller.fqdn
    return (
        request.build_absolute_uri(route) + extra
        if not host
        else f"{request.scheme}://{host}:{RACK_CONTROLLER_PORT}{route}{extra}"
    )

Revision history for this message
Michal Kielkowski (mikiel) wrote :

No loadbalancer, no reverse proxy (except for MaaS own nginx). I use external dhcp scenario.
I find this error consistently appearing accross my 2 setups. One in AWS vpc, one in my private virtual lab. Let me know if you need addtional info or traces/logs.

Net config from the lab env:

Network netplans (ubuntu server 22.04) config:
network:
  ethernets:
    eth0:
      addresses:
        - 192.168.0.234/24
      nameservers:
        addresses: [192.168.0.1]
        search: [lan]
      routes:
        - to: default
          via: 192.168.0.1
  version: 2

root@maas:/var/log/maas# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:15:5d:38:01:29 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.234/24 brd 192.168.0.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fd77:eeee:7bdb::64f/128 scope global dynamic noprefixroute
       valid_lft 42713sec preferred_lft 42713sec
    inet6 fd77:eeee:7bdb:0:215:5dff:fe38:129/64 scope global mngtmpaddr noprefixroute
       valid_lft forever preferred_lft forever
    inet6 fe80::215:5dff:fe38:129/64 scope link
       valid_lft forever preferred_lft forever
root@maas:/var/log/maas#

Revision history for this message
Alexsander de Souza (alexsander-souza) wrote :

We need to update the nginx configuration from

proxy_set_header Host $host;

to

proxy_set_header Host $http_host;

otherwise we lose the port information

Changed in maas:
status: New → Triaged
importance: Undecided → High
milestone: none → 3.5.0
Changed in maas:
assignee: nobody → Anton Troyanov (troyanov)
Changed in maas:
status: Triaged → Fix Committed
Revision history for this message
Yuriy Tabolin (olddanmer) wrote :

Have the same issue on 3.3.4

Workaround works for me: add 'listen 80;' to /usr/lib/python3/dist-packages/provisioningserver/templates/http/rackd.nginx.conf.template and restart maas-rackd

Revision history for this message
Anton Troyanov (troyanov) wrote :

Hi Yuri,

The fix was backported to 3.3 branch after 3.3.4 was released (thats why it is affected).

The version that should have the fix is 3.3.5 (however it is not released yet)

Revision history for this message
Alan Baghumian (alanbach) wrote :

I encountered this issue with MAAS 3.3.4 today, simply upgrading to 3.3.5 from the edge channel resolved the issues.

Changed in maas:
milestone: 3.5.0 → 3.5.0-beta1
status: Fix Committed → Fix Released
Revision history for this message
Youhei Tooyama (VirtualTech Japan) (ytooyama-virtualtech) wrote (last edit ):

MAAS 3.3.6 has been released, but this problem will still be reproduced.
MAAS 3.3.6がリリースされましたが、この問題はまだ再現します。

If you deploy, you can observe the next error.
デプロイすると次のエラーを観測できます。

```
handlers.py [WARNING]: Failed posting event: {"name": "init-local/check-cache", "description": "attempting to read from cache [trust]", "event _type": "start", "origin": "cloudinit", “timestamp": 1711397389.92540241}. This was caused by: HTConnectionPoo!(host='172.17.28.3' , port=5248): Max retries exceeded with urI: /MAAS/me adata/status/wysrbn (Caused by NeuconnectionError('<ur11ibs.connection. HTTPConnection object at 0x7f2961a45d50>: Failed to estal lish a new connection: (Errno 101] Network is unreachable'))
```

Workaround is to do the following.
ワークアラウンドは次を実行することです。

```
sudo systemctl restart maas-proxy.service maas-rackd
```

Revision history for this message
Anton Troyanov (troyanov) wrote :

Hello Youhei Tooyama,

I don't think it is the same issue.

If I understand correctly that error was captured from the machine?
It could be that there was an issue with rackd around that time?
Is there anything interesting in rackd.log?

Revision history for this message
Youhei Tooyama (VirtualTech Japan) (ytooyama-virtualtech) wrote :
Download full text (6.5 KiB)

Hello Anton Troyanov,

After updating to MAAS 3.3.6, the following logs came out.
MAAS 3.3.6に更新してから次のようなログがでてくるようになりました。

regiond.log

```
2024-03-14 08:19:20 maasserver: [error] Error while calling ScanNetworks: Unable to get RPC connection for rack controller 'maas4' (erpd7f).
2024-03-14 08:19:20 maasserver.regiondservices.active_discovery: [info] Active network discovery: Unable to initiate network scanning on any rack controller. Verify that the rack controllers are started and have connected to the region.
2024-03-14 08:19:21 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/wysrbn HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.25.1)
2024-03-14 08:19:21 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/wysrbn HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.25.1)
2024-03-14 08:19:21 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/wysrbn HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.25.1)
2024-03-14 08:19:21 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/wysrbn HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.25.1)
2024-03-14 08:19:21 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/wysrbn HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.25.1)
2024-03-14 08:19:21 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/wysrbn HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.25.1)
2024-03-14 08:19:21 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/wysrbn HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.25.1)
2024-03-14 08:19:21 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/wysrbn HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.25.1)
2024-03-14 08:19:21 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/wysrbn HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.25.1)
2024-03-14 08:19:21 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/wysrbn HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.25.1)
2024-03-14 08:19:21 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/wysrbn HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.25.1)
2024-03-14 08:19:21 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/wysrbn HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.25.1)
2024-03-14 08:19:21 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/wysrbn HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.25.1)
2024-03-14 08:19:21 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/wysrbn HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.25.1)
2024-03-14 08:19:21 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/wysrbn HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.25.1)
2024-03-14 08:19:21 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/wysrbn HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.25.1)
2024-03-14 08:19:21 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/wysrbn HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.25.1)
2024-03-14 08:19:21 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/wysrbn HT...

Read more...

Revision history for this message
Youhei Tooyama (VirtualTech Japan) (ytooyama-virtualtech) wrote :

Oh.. After re-commissioning the server and re-syncing the image, the problem was resolved.
Sorry!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.