MAAS

[2.1,2.2] Anonymous auto-enlistment fails to contact metadata service

Bug #1665459 reported by Spyderdyne on 2017-02-16

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	MAAS	Fix Released	High	Mike Pontillo	MAAS 2.2.0
	2.1	Fix Released	High	Mike Pontillo	MAAS 2.1.4

Bug Description

Per the attached diagram.

MaaS is receiving DHCP PXE calls via the router's DHCP helper address/forwarder, but fails to boot nodes for acceptance.

Region:

root@juju-rack2:/var/log/maas# cat /etc/issue
Ubuntu 16.04.2 LTS \n \l

root@juju-rack2:/var/log/maas# uname -a
Linux juju-rack2.home.spyderdyne.net 4.4.43-v7+ #948 SMP Sun Jan 15 22:20:07 GMT 2017 armv7l armv7l armv7l GNU/Linux

rack ping,

root@juju-rack2:/var/log/maas# ping 192.168.199.6
PING 192.168.199.6 (192.168.199.6) 56(84) bytes of data.
64 bytes from 192.168.199.6: icmp_seq=1 ttl=64 time=0.400 ms
64 bytes from 192.168.199.6: icmp_seq=2 ttl=64 time=0.402 ms
^C
--- 192.168.199.6 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.400/0.401/0.402/0.001 ms

root@juju-rack2:/var/log/maas# dpkg -s maas-region-controller
Package: maas-region-controller
Status: install ok installed
Priority: optional
Section: net
Installed-Size: 52
Maintainer: Ubuntu Developers <email address hidden>
Architecture: all
Source: maas
Version: 2.2.0~beta2+bzr5717-0ubuntu1~16.04.1
Depends: avahi-utils, dbconfig-pgsql, iputils-ping, maas-dns (= 2.2.0~beta2+bzr5717-0ubuntu1~16.04.1), maas-region-api (= 2.2.0~beta2+bzr5717-0ubuntu1~16.04.1), postgresql (>= 9.1), tcpdump, debconf (>= 0.5) | debconf-2.0
Recommends: openssh-server
Suggests: nmap
Description: Region Controller for MAAS
The MAAS region controller (maas-regiond) is the REST API server for
all MAAS clients, and the postgres database that maintains machine
state for the entire data centre (or ‚Äúregion‚Äù). The region controller
an be scaled-out and highly available given the appropriate postgres
setup and additional API servers.
.
This package installs the postgres database and the API server, so it
is appropriate for the initial installation of a new MAAS region. To
scale out the controller or make it highly available, install
maas-region-controller-api on additional servers and ensure the
postgres database is HA too.
Homepage: http://maas.io/

Rack:

root@rack2-maas-rack0:~# cat /etc/issue
Ubuntu 16.10 \n \l

root@rack2-maas-rack0:~# uname -a
Linux rack2-maas-rack0 4.8.0-37-generic #39-Ubuntu SMP Thu Jan 26 02:27:07 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

region ping,

root@rack2-maas-rack0:~# ping 192.168.199.2
PING 192.168.199.2 (192.168.199.2) 56(84) bytes of data.
64 bytes from 192.168.199.2: icmp_seq=1 ttl=64 time=0.422 ms
64 bytes from 192.168.199.2: icmp_seq=2 ttl=64 time=0.414 ms
^C
--- 192.168.199.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1025ms
rtt min/avg/max/mdev = 0.414/0.418/0.422/0.004 ms

root@rack2-maas-rack0:~# dpkg -s maas-rack-controller
Package: maas-rack-controller
Status: install ok installed
Priority: optional
Section: net
Installed-Size: 103
Maintainer: Ubuntu Developers <email address hidden>
Architecture: all
Source: maas
Version: 2.2.0~beta2+bzr5717-0ubuntu1~16.10.1
Replaces: maas-cluster-controller, python-maas-provisioningserver
Depends: authbind, avahi-utils, bind9utils, distro-info, freeipmi-tools, grub-common, iputils-ping, maas-cli (= 2.2.0~beta2+bzr5717-0ubuntu1~16.10.1), maas-common (= 2.2.0~beta2+bzr5717-0ubuntu1~16.10.1), maas-dhcp (= 2.2.0~beta2+bzr5717-0ubuntu1~16.10.1), ntp, pxelinux | syslinux-common (<< 3:6.00~pre4+dfsg-5), python3-httplib2, python3-maas-provisioningserver (= 2.2.0~beta2+bzr5717-0ubuntu1~16.10.1), python3-netaddr, python3-tempita, python3-twisted, python3-zope.interface, syslinux-common, tcpdump, tgt, uuid-runtime, wget, debconf (>= 0.5) | debconf-2.0, init-system-helpers (>= 1.18~), python3:any (>= 3.5~)
Suggests: amtterm, ipmitool, libvirt-bin, nmap, wsmancli
Breaks: maas-cluster-controller, python-maas-provisioningserver
Conflicts: tftpd-hpa
Conffiles:
/etc/logrotate.d/maas-rack-controller 22fcb01a80fe77c722ab6ca9c78de11d
/etc/sudoers.d/99-maas-sudoers c79448e89b644bf67f3dd5d430392f85
Description: Rack Controller for MAAS
The MAAS rack controller (maas-rackd) provides highly available, fast
and local broadcast services to the machines provisioned by MAAS. You
need a MAAS rack controller attached to each fabric (which is a set of
trunked switches). You can attach multiple rack controllers to these
physical networks for high availability, with secondary rack controllers
automatically stepping to provide these services if the primary rack
controller fails.
.
A common configuration is to have a rack controller in each rack, with
a fast primary network interface to the rack switch and secondary
network interfaces on one or two other nearby racks for high
availability redundancy.
.
This package depends on the necessary components to provide iSCSI,
DHCP, TFTP and power management.
Homepage: http://maas.io/

Dashboard MaaS Version:

MAAS Version 2.2.0 (beta2+bzr5717)

See original description

Tags:

Related branches

lp:~mpontillo/maas/enlistment-ip--bug-1665459

Merged into lp:~maas-committers/maas/trunk at revision 5746

Lee Trager (community): Approve on 2017-02-17

lp:~mpontillo/maas/enlistment-ip--bug-1665459--2.1

Merged into lp:maas/2.1 at revision 5589

Mike Pontillo (community): Approve on 2017-02-17

Revision history for this message

Spyderdyne (spyderdyne) wrote on 2017-02-16:

Network Diagram Edit (295.2 KiB, application/pdf)

Revision history for this message

Spyderdyne (spyderdyne) wrote on 2017-02-16:

PSX self-assigned IP Edit (2.6 MiB, image/jpeg)

It attempts to use a self-signed IP address to fetch OS components (image attached.) I am not sure how much of this is normal expected behavior having not seen this work correctly before.

Revision history for this message

Spyderdyne (spyderdyne) wrote on 2017-02-16:

PXE gateway IP Edit (2.4 MiB, image/jpeg)

When the node is manually added to the zone as a machine entry with a MAC address, PXE initially resolves to MaaS and the state changes to Commissioning, but then all subsequent calls to the MaaS rack controller ignore the DHCP helper address and apparently expect MaaS to be the gateway (picture attached of this as well.)

MaaS stable iSCSI target dies for no apparent reason, and now MaaS devel is unable to complete the PXE process. At this time there is apparently no version of MaaS that works in this setup. There are so many things to check at this point that I am unsure where to begin at this point.

Spyderdyne (spyderdyne) on 2017-02-16

description:

updated

Revision history for this message

Blake Rouse (blake-rouse) wrote on 2017-02-16:

So the issue here is that the deploying node want to talk to the region controller over the same IP address that the rack controller talks to the region controller. Based on your report your saying that the rack controller has two interfaces? One to talk to the region and another for the machines to boot from?

MAAS architecture is to have the deploying machines to be able to directly talk to the region controller. There is a way to configure your systems to perform in this manner, but I want to be sure that this is the problem you are experiencing?

Revision history for this message

Spyderdyne (spyderdyne) wrote on 2017-02-16: Re: [Bug 1665459] Re: MaaS devel bad request routing on PXE

Same network.

Nodes can talk to either MaaS device on this trunk over the untagged
default net. All MaaS hosts and PXE hosts are on the same net (per the
diagram.) The rack controller handles DHCP and PXE, then I have to assume,
also provides the iSCSi target to the PXE node, or we wouldn't be syncing
images to the rack controllers right?

My understanding was that only the rack controller needed to be able to
reach the region controller for management functions, but all nodes can hit
both in this scenario regardless.

Here is the PXE process now (from the screen shots I attached.)

1. New node starts and makes the PXE call. The DHCP call hits the egress
gateway router (192.168.199.1) which the sends the request to the DHCP
Forwarded Address of the MaaS rack controller on the same untagged network
(192.168.199.6) because it is the device providing DHCP for hosts that are
not already statically assigned.

2. MaaS rack controller accepts the request, assigns an IP address from
the dynamic pool, and starts the PXE process.

3. The PXE node completes the cloud-init step, reaches the "Reached Target
Network." step, and starts the image download process.

4. PXE nodes that do not already have a machine entry in MaaS for their
MAC address attempt to download the image from:

"http://169.254.169.254/2009-04-04/meta-data/instance-id"

region controller: 192.168.199.2
rack controller: 192.168.199.6
someone's link local address: 169.254.169.254

per the image this connection fails until PXE exits.

5. Manually adding a machine entry to the region controller seems to
change this behavior, causing the call to hit the default gateway instead
of whatever machine's link local metadata address it was searching for
before (second image provided) and likewise fails because the egress router
does not host this metadata service.

Does this make sense at all?

On Feb 16, 2017 4:50 PM, "Blake Rouse" <email address hidden> wrote:

So the issue here is that the deploying node want to talk to the region
controller over the same IP address that the rack controller talks to
the region controller. Based on your report your saying that the rack
controller has two interfaces? One to talk to the region and another for
the machines to boot from?

MAAS architecture is to have the deploying machines to be able to
directly talk to the region controller. There is a way to configure your
systems to perform in this manner, but I want to be sure that this is
the problem you are experiencing?

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1665459

Title:
MaaS devel bad request routing on PXE

To manage notifications about this bug go to:
https://bugs.launchpad.net/maas/+bug/1665459/+subscriptions

Same network.

My understanding was that only the rack controller needed to be able to
reach the region controller for management functions, but all nodes can hit
both in this scenario regardless.

Here is the PXE process now (from the screen shots I attached.)

1.  New node starts and makes the PXE call.  The DHCP call hits the egress
gateway router (192.168.199.1) which the sends the request to the DHCP
Forwarded Address of the MaaS rack controller on the same untagged network
(192.168.199.6) because it is the device providing DHCP for hosts that are
not already statically assigned.

2.  MaaS rack controller accepts the request, assigns an IP address from
the dynamic pool, and starts the PXE process.

3.  The PXE node completes the cloud-init step, reaches the "Reached Target
Network." step, and starts the image download process.

4.  PXE nodes that do not already have a machine entry in MaaS for their
MAC address attempt to download the image from:

"http://169.254.169.254/2009-04-04/meta-data/instance-id"

region controller:  192.168.199.2
rack controller:  192.168.199.6
someone's link local address:  169.254.169.254

per the image this connection fails until PXE exits.

5.  Manually adding a machine entry to the region controller seems to
change this behavior, causing the call to hit the default gateway instead
of whatever machine's link local metadata address it was searching for
before (second image provided) and likewise fails because the egress router
does not host this metadata service.

Does this make sense at all?

On Feb 16, 2017 4:50 PM, "Blake Rouse" <blake.rouse@canonical.com> wrote:

So the issue here is that the deploying node want to talk to the region
controller over the same IP address that the rack controller talks to
the region controller. Based on your report your saying that the rack
controller has two interfaces? One to talk to the region and another for
the machines to boot from?

MAAS architecture is to have the deploying machines to be able to
directly talk to the region controller. There is a way to configure your
systems to perform in this manner, but I want to be sure that this is
the problem you are experiencing?

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1665459

Title:
  MaaS devel bad request routing on PXE

To manage notifications about this bug go to:
https://bugs.launchpad.net/maas/+bug/1665459/+subscriptions

Revision history for this message

Spyderdyne (spyderdyne) wrote on 2017-02-16: Re: MaaS devel bad request routing on PXE

From the dashboard:

Latest machine events
View full history
Event Time
Queried node's BMC - Power state queried: unknown Thu, 16 Feb. 2017 15:45:23
Node changed status - From 'Commissioning' to 'Failed commissioning' Thu, 16 Feb. 2017 15:45:02
Marking node failed - Machine operation 'Commissioning' timed out after 20 minutes. Thu, 16 Feb. 2017 15:45:02
Node changed status - From 'New' to 'Commissioning' Thu, 16 Feb. 2017 15:22:59
User starting node commissioning - (spyderdyne) Thu, 16 Feb. 2017 15:22:59

Machine output Commissioning Output
Filename Time Output
00-maas-01-cpuinfo Thu, 16 Feb. 2017 15:22:59 0 lines
00-maas-01-lshw Thu, 16 Feb. 2017 15:22:59 0 lines
00-maas-02-virtuality Thu, 16 Feb. 2017 15:22:59 0 lines
00-maas-03-install-lldpd Thu, 16 Feb. 2017 15:22:59 0 lines
00-maas-04-list-modaliases Thu, 16 Feb. 2017 15:22:59 0 lines
00-maas-06-dhcp-unconfigured-ifaces Thu, 16 Feb. 2017 15:22:59 0 lines
00-maas-07-block-devices Thu, 16 Feb. 2017 15:22:59 0 lines
99-maas-01-wait-for-lldpd Thu, 16 Feb. 2017 15:22:59 0 lines
99-maas-02-capture-lldp Thu, 16 Feb. 2017 15:22:59 0 lines
99-maas-03-network-interfaces Thu, 16 Feb. 2017 15:22:59 0 lines
99-maas-04-network-interfaces-with-sriov Thu, 16 Feb. 2017 15:22:59 0 lines

Revision history for this message

Blake Rouse (blake-rouse) wrote on 2017-02-16:

To fix the issue:

Edit /etc/maas/rackd.conf and change maas_url:

maas_url=http://{ip_rack_and_machines_can_reach}/MAAS

Restart the maas-rackd service:

sudo systemctl restart maas-rackd

That will ensure that the machines that boot from that rack controller talk to the IP address on the subnet that machines can reach for the region controller.

Revision history for this message

Spyderdyne (spyderdyne) wrote on 2017-02-16:

Here is a video of what is happening when we boot up:

https://www.youtube.com/watch?v=3zXYZqURtjk

Blake Rouse (blake-rouse) on 2017-02-16

Changed in maas:
status:	New → Invalid

Revision history for this message

Spyderdyne (spyderdyne) wrote on 2017-02-16:

root@rack2-maas-rack0:~# cat /etc/maas/rackd.conf
cluster_uuid: 7b9740f2-eb47-40b3-94da-be3beb44f755
maas_url: http://192.168.199.2:5240/MAAS

already set correctly, but apparently being ignored?

Spyderdyne (spyderdyne) on 2017-02-16

Changed in maas:
status:	Invalid → New

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2017-02-16:

#10

It sounds like you are using DHCP relay in your environment. (This is a new feature in MAAS 2.2.) Have you set the relay_vlan for the deployment VLAN up, so that it matches the VLAN where DHCP is running?

For example, after logging into the MAAS CLI, if the untagged VLAN on fabric-0 is having its DHCP requests forwarded to the VLAN with id=5003, you would do:

maas <profile> vlan update fabric-0 0 relay_vlan=5003

Revision history for this message

Spyderdyne (spyderdyne) wrote on 2017-02-16:

#11

Not using DHCP relay per se. Just setting a DHCP forwarder address on the router so when a DHCP address hits the gateway it points to the correct host. This is just a requirement when running a DHCP host other than the gateway and should be part of every MaaS deployment unless someone somewhere is also using their MaaS controllers as Linux routers instead of hardware routers.

DHCP is also up and working just fine, and all hosts in my network (including the host that is PXE booting, it always gets 192.168.199.130) are getting DHCP addresses from the MaaS range I defined for this flat network.

Revision history for this message

Spyderdyne (spyderdyne) wrote on 2017-02-16:

#12

The maas host and the client are on the same VLAN in other words.

Revision history for this message

Spyderdyne (spyderdyne) wrote on 2017-02-16:

#13

region controller, rack controller, and PXE host are in the same collision domain.

Revision history for this message

Spyderdyne (spyderdyne) wrote on 2017-02-16:

#14

When I pulled the PXE hosts MAC address fomr the arp table of the rack controller and entered it into the dashboard manually as a machine it still failed. Something in the PXE cloud-init process is failing, and it appears to be looking for an IaaS metadata service address to forward these requests to the MaaS metadata service like AWS would provide, but that capability wouldnt exist on a non-overlay hardware network.

Revision history for this message

Spyderdyne (spyderdyne) wrote on 2017-02-17:

#15

I set a static route on the router to forward any requests addressed to 169.254.169.254 to the region controller at 192.168.199.2, but this still fails. This appears to be something weird happening at layer 2 of the TCP stack and not something I can fix with a static route.

Revision history for this message

Blake Rouse (blake-rouse) wrote on 2017-02-17:

#16

At 11 seconds in that video you can see that the machine gets the correct cloud-config-url correctly with the ip address above. The IP address that the DHCP server gave is also in correct subnet as that IP address.

So the only reason it would not be getting the cloud config is that the machine does not have network access to the region controller.

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2017-02-17:

#17

Ah, I would have expected that DHCP would simply be turned off on the gateway on that interface, but that should be okay.

Stepping frame-by-frame through the video I can see that:

- [0:13] The cloud-config-url is set to http://192.168.199.2:5240/MAAS/... so your maas_url is not being ignored.
- [0:13] The machine is attempting to enlist (enlist-preseed is present in the URL and you see maas-enlist' as the hostname in the ip= parameter)
- [0:25] The enlisting machine is assigned an appropriate IP address (192.168.122.103)
- [2:46] cloud-init on the enlisting machine gives up on reaching the MAAS metadata service ("[CRITICAL]: Giving up on md from ['http://192.168.199.133:5240/MAAS/metadata/enlist/2012-03-01/meta-data/instance-id'] after 124 seconds") Sadly the camera was not pointed at the screen immediately before that, so I don't know if any other useful errors scrolled by before that.

Then cloud-init falls back to looking for other metadata providers, such as the link-local provider you see, at 169.254.169.254 (I think this is used for AWS and/or other cloud hosts).

I attempted to reproduce this issue on my local MAAS test bed and confirmed that I see the same issue. It looks like anonymous enlistment is not working in the current MAAS 2.2 beta.

You might have more success if you add the node manually (by expected boot MAC) and then commission it.

Changed in maas:
status:	New → Triaged
importance:	Undecided → High

Mike Pontillo (mpontillo) on 2017-02-17

summary:

- MaaS devel bad request routing on PXE
+ [2.2] Anonymous auto-enlistment fails to contact metadata service

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2017-02-17: Re: [2.2] Anonymous auto-enlistment fails to contact metadata service

#18

Er. Wait a minute. I thought I had transcribed that wrong, but I wonder why is it trying to talk to .133? I thought MAAS was at the .2 address.

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2017-02-17:

#19

Interestingly, in my test bed, I see it "Giving up on md from..." as well, and I also thought the IP address was correct. But it turned out to be an IP address is on a network behind a router, unreachable from the enlisting node.

So it seems the bug is that MAAS is providing an unreachable IP address to enlisting nodes.

What does that .133 address in your setup belong to?

Revision history for this message

Spyderdyne (spyderdyne) wrote on 2017-02-17:

#20

not sure. maas-region is at .2, maas-rack at .6. The device discovery doesn't know anything about a .199.133, and it literally doesn't exist on the network in anybody's ARP tables.

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2017-02-17:

#21

Can you open a browser and hit:

http://<maas-ip>:5240/MAAS/metadata/latest/enlist-preseed/?op=get_enlist_preseed

For me I see "metadata_url: http://192.168.122.190:5240/MAAS/metadata/enlist", and 192.168.122.190 is the IP address of my testbed MAAS on its management network, not a network that the PXE booting/commissioning/deploying nodes can reach.

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2017-02-17:

#22

Also, would you please tell me the output of the following:

$ sudo maas-region shell
>>> from maasserver.preseed import get_preseed_context
>>> from pprint import pprint
>>> from maasserver.models import RackController
>>> pprint([get_preseed_context(rack_controller=rack) for rack in RackController.objects.all()])

For me I see this:

[{'metadata_enlist_url': 'http://172.16.99.2:5240/metadata/enlist',
  'osystem': '',
  'release': '',
  'server_host': '172.16.99.2',
  'server_url': 'http://172.16.99.2:5240/api/2.0/machines/',
  'syslog_host_port': '172.16.99.2:514'}]

Which doesn't match the .190 address, so that seems to be the bug; this data isn't making it into the preseed delivered to the enlisting node.

Revision history for this message

Spyderdyne (spyderdyne) wrote on 2017-02-17:

#23

MCP Edit (973.9 KiB, image/jpeg)

So at this point I have to run juju devel to be able to not use the juju LXD as a router/default gateway because the fix is in, but nobody knows when it will hit the stable branch. Also, MaaS stable iSCSI server dies on PXE (not sure how that makes a stable release) and MaaS devel can't PXE.

I am building this thing out to demonstrate to prospective employers that I can build an entire cloud data center from scratch. Since the Canonical stack is literally too broken to use now I guess I will have to deploy RHEL kickstart or Crowbar for this instead, or nobody will believe I know how to do this. It's too bad too. I even made a MAAS label for the MaaS Raspberry Pi power control switch.

Not to be preachy, I have been deploying this stack in one form or another for evaluation purposes in enterprise environments for around 3 years now, and it still isn't stable enough to be deployed on a dev platform. This stuff has definitely come long way over time, but is more art than science since it is still unusable.

I am a huge fan of these components and features, but at some point we might want to ask if we are building a working commodity, or if we are just developing for the sake of development. I personally keep hoping that your products will exist in a usable form one day when I am in a position to use them professionally, because to be honest they are fun to work with. Unfortunately those opportunities are rare but I always at least try to check in each time to see if these projects are ready and will always continue to do so.

I will watch and see if this gets fixed soon, but right now this is actually making me look bad and if I don't impress someone to hire me soon, I'm literally going to lose my house.

Thanks for your help. Will keep checking back in when I can.

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2017-02-17:

#24

Actually, comment 21 might provide misleading results.

In metadataserver/api.py, when we go to get the enlistment preseed, a call to get_enlist_preseed() should happen. Then find_rack_controller(request) should be called, where based on the request's IP address (REMOTE_ADDR) we should be able to find and return the primary rack for the subnet.

Since MAAS finds the subnet for the rack controller based on the request IP, I decided to check what request IP address the request object contained. For me, it contained an IPv4-mapped IPv6 address. I then checked if `Subnet.objects.get_best_subnet_for_ip()` handled that[1].

It turns out that it does not. Therefore this code cannot locate a rack controller to use as the "MAAS facing" rack controller when we go to generate the preseed. Therefore an incorrect rack-facing IP address might be chosen for PXE boot requests.

This issue is most likely present in the latest MAAS 2.1.x release as well.

[1]:
http://paste.ubuntu.com/24010660/

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2017-02-17:

#25

I'm sorry you're frustrated. We're a small team, but I will say that we produce a product that is indeed used in production both internally at Canonical and externally. But you're on the bleeding edge of a complex technology stack, so I'm not surprised this came up. In any case, thanks for your feedback, and for testing MAAS.

(And for the record, others on the team tried to replicate the issue, and I was the only one who also saw it occur.)

Anyway, if you want to give it a try, this patch fixes the issue for me:

http://paste.ubuntu.com/24010744/

You can just edit the code accordingly (as specified by the patch) in /usr/lib/python3/dist-packages if you want to give it a try.

I'll try to get this into the next beta release (and backported to MAAS 2.1). Please let me know if this fixes things up for you.

summary:	- [2.2] Anonymous auto-enlistment fails to contact metadata service + [2.1,2.2] Anonymous auto-enlistment fails to contact metadata service
Changed in maas:
assignee:	nobody → Mike Pontillo (mpontillo)
milestone:	none → 2.2.0

Revision history for this message

Spyderdyne (spyderdyne) wrote on 2017-02-17: Re: [Bug 1665459] Re: [2.2] Anonymous auto-enlistment fails to contact metadata service

#26

I am on the bleeding edge because the stable branch iSCSI target service is
broken. Im sorry if I sound negative. You guys are awesome. I have just
never found this project in a usable state for any of my environments. That
is not to imply that it has never worked, or it doesnt work at all in any
environment. I only mean that every time I pick it up there is something
seriously wrong with it.
I seem to have a knack for finding bugs wherever i go. I will keep poking
at it. I may actually be too committed at this point to do anything else
and will be testing still.
I will probably just punt tomorrow and manually install the OS. Your
competitors arent exactly known for their deep armhf support and that also
matters to me at the moment.
Since I can manually add and remove nodes in juju, and juju can literally
recreate an entire openstack deployment on LXD blades with zfs in minutes
its not too hard to flip between them any more.
I may just relegate docker to the third blade so i can enjoy deleting it
over and over.
;)

On Feb 16, 2017 8:50 PM, "Mike Pontillo" <email address hidden>
wrote:

> I'm sorry you're frustrated. We're a small team, but I will say that we
> produce a product that is indeed used in production both internally at
> Canonical and externally. But you're on the bleeding edge of a complex
> technology stack, so I'm not surprised this came up. In any case, thanks
> for your feedback, and for testing MAAS.
>
> (And for the record, others on the team tried to replicate the issue,
> and I was the only one who also saw it occur.)
>
> Anyway, if you want to give it a try, this patch fixes the issue for me:
>
> http://paste.ubuntu.com/24010744/
>
> You can just edit the code accordingly (as specified by the patch) in
> /usr/lib/python3/dist-packages if you want to give it a try.
>
> I'll try to get this into the next beta release (and backported to MAAS
> 2.1). Please let me know if this fixes things up for you.
>
> ** Summary changed:
>
> - [2.2] Anonymous auto-enlistment fails to contact metadata service
> + [2.1,2.2] Anonymous auto-enlistment fails to contact metadata service
>
> ** Also affects: maas/2.1
> Importance: Undecided
> Status: New
>
> ** Changed in: maas/2.1
> Status: New => Triaged
>
> ** Changed in: maas/2.1
> Importance: Undecided => High
>
> ** Changed in: maas
> Assignee: (unassigned) => Mike Pontillo (mpontillo)
>
> ** Changed in: maas/2.1
> Assignee: (unassigned) => Mike Pontillo (mpontillo)
>
> ** Changed in: maas/2.1
> Milestone: None => 2.1.4
>
> ** Changed in: maas
> Milestone: None => 2.2.0
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1665459
>
> Title:
> [2.1,2.2] Anonymous auto-enlistment fails to contact metadata service
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1665459/+subscriptions
>

On Feb 16, 2017 8:50 PM, "Mike Pontillo" <mike.pontillo@canonical.com>
wrote:

> I'm sorry you're frustrated. We're a small team, but I will say that we
> produce a product that is indeed used in production both internally at
> Canonical and externally. But you're on the bleeding edge of a complex
> technology stack, so I'm not surprised this came up. In any case, thanks
> for your feedback, and for testing MAAS.
>
> (And for the record, others on the team tried to replicate the issue,
> and I was the only one who also saw it occur.)
>
> Anyway, if you want to give it a try, this patch fixes the issue for me:
>
> http://paste.ubuntu.com/24010744/
>
> You can just edit the code accordingly (as specified by the patch) in
> /usr/lib/python3/dist-packages if you want to give it a try.
>
> I'll try to get this into the next beta release (and backported to MAAS
> 2.1). Please let me know if this fixes things up for you.
>
> ** Summary changed:
>
> - [2.2] Anonymous auto-enlistment fails to contact metadata service
> + [2.1,2.2] Anonymous auto-enlistment fails to contact metadata service
>
> ** Also affects: maas/2.1
>    Importance: Undecided
>        Status: New
>
> ** Changed in: maas/2.1
>        Status: New => Triaged
>
> ** Changed in: maas/2.1
>    Importance: Undecided => High
>
> ** Changed in: maas
>      Assignee: (unassigned) => Mike Pontillo (mpontillo)
>
> ** Changed in: maas/2.1
>      Assignee: (unassigned) => Mike Pontillo (mpontillo)
>
> ** Changed in: maas/2.1
>     Milestone: None => 2.1.4
>
> ** Changed in: maas
>     Milestone: None => 2.2.0
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1665459
>
> Title:
>   [2.1,2.2] Anonymous auto-enlistment fails to contact metadata service
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1665459/+subscriptions
>

MAAS Lander (maas-lander) on 2017-02-17

Changed in maas:
status:	Triaged → Fix Committed

Revision history for this message

Spyderdyne (spyderdyne) wrote on 2017-02-17: Re: [Bug 1665459] Re: [2.1, 2.2] Anonymous auto-enlistment fails to contact metadata service

#27

I will update and test this afternoon.

On Feb 17, 2017 2:50 AM, "MAAS Lander" <email address hidden> wrote:

> ** Changed in: maas/2.1
> Status: Triaged => Fix Committed
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1665459
>
> Title:
> [2.1,2.2] Anonymous auto-enlistment fails to contact metadata service
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1665459/+subscriptions
>

Revision history for this message

OpenStack (andy1723) wrote on 2017-02-20:

#28

Hi all,

I am experiencing exactly the same issue with my installation.

It is:

Ubuntu 16.04.2 LTS

ii maas 2.1.3+bzr5573-0ubuntu1~16.04.1 all "Metal as a Service" is a physical cloud and IPAM
ii maas-cli 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS client and command-line interface
ii maas-common 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS server common files
ii maas-dhcp 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS DHCP server
ii maas-dns 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS DNS server
ii maas-proxy 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS Caching Proxy
ii maas-rack-controller 2.1.3+bzr5573-0ubuntu1~16.04.1 all Rack Controller for MAAS
ii maas-region-api 2.1.3+bzr5573-0ubuntu1~16.04.1 all Region controller API service for MAAS
ii maas-region-controller 2.1.3+bzr5573-0ubuntu1~16.04.1 all Region Controller for MAAS
ii python3-django-maas 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS server provisioning libraries (Python 3)

I've implemented Blake's suggestion:

Edit /etc/maas/rackd.conf and change maas_url:

maas_url=http://{ip_rack_and_machines_can_reach}/MAAS

Restart the maas-rackd service:

sudo systemctl restart maas-rackd

and applied a patch recommended by Mike:

=== modified file 'src/maasserver/models/subnet.py'
--- src/maasserver/models/subnet.py 2017-01-04 17:45:19 +0000
+++ src/maasserver/models/subnet.py 2017-02-17 01:27:49 +0000
@@ -159,6 +159,9 @@
     def get_best_subnet_for_ip(self, ip):
         """Find the most-specific managed Subnet the specified IP address
         belongs to."""
+ ip = IPAddress(ip)
+ if ip.is_ipv4_mapped():
+ ip = ip.ipv4()
         subnets = self.raw(
             self.find_best_subnet_for_ip_query,
             params=[str(ip)])

The issue is still there.

I am very interested making the system work and hope it can be fixed soon.

Thanks a lot,

Hi all,

I am experiencing exactly the same issue with my installation.

It is:

Ubuntu 16.04.2 LTS

ii  maas    2.1.3+bzr5573-0ubuntu1~16.04.1     all          "Metal as a Service" is a physical cloud and IPAM
ii  maas-cli  2.1.3+bzr5573-0ubuntu1~16.04.1     all          MAAS client and command-line interface
ii  maas-common 2.1.3+bzr5573-0ubuntu1~16.04.1     all          MAAS server common files
ii  maas-dhcp   2.1.3+bzr5573-0ubuntu1~16.04.1     all          MAAS DHCP server
ii  maas-dns   2.1.3+bzr5573-0ubuntu1~16.04.1     all          MAAS DNS server
ii  maas-proxy 2.1.3+bzr5573-0ubuntu1~16.04.1     all          MAAS Caching Proxy
ii  maas-rack-controller 2.1.3+bzr5573-0ubuntu1~16.04.1     all          Rack Controller for MAAS
ii  maas-region-api 2.1.3+bzr5573-0ubuntu1~16.04.1     all          Region controller API service for MAAS
ii  maas-region-controller 2.1.3+bzr5573-0ubuntu1~16.04.1     all          Region Controller for MAAS
ii  python3-django-maas 2.1.3+bzr5573-0ubuntu1~16.04.1     all          MAAS server Django web framework (Python 3)
ii  python3-maas-client 2.1.3+bzr5573-0ubuntu1~16.04.1     all          MAAS python API client (Python 3)
ii  python3-maas-provisioningserver    2.1.3+bzr5573-0ubuntu1~16.04.1     all          MAAS server provisioning libraries (Python 3)

I've implemented Blake's suggestion:

Edit /etc/maas/rackd.conf and change maas_url:

maas_url=http://{ip_rack_and_machines_can_reach}/MAAS

Restart the maas-rackd service:

sudo systemctl restart maas-rackd

and applied a patch recommended by Mike:

=== modified file 'src/maasserver/models/subnet.py'
--- src/maasserver/models/subnet.py	2017-01-04 17:45:19 +0000
+++ src/maasserver/models/subnet.py	2017-02-17 01:27:49 +0000
@@ -159,6 +159,9 @@
     def get_best_subnet_for_ip(self, ip):
         """Find the most-specific managed Subnet the specified IP address
         belongs to."""
+        ip = IPAddress(ip)
+        if ip.is_ipv4_mapped():
+            ip = ip.ipv4()
         subnets = self.raw(
             self.find_best_subnet_for_ip_query,
             params=[str(ip)])

The issue is still there.

I am very interested making the system work and hope it can be fixed soon.

Thanks a lot,

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2017-02-21:

#29

@andy1723, if the issue still persists after this patch is applied, then the root cause is likely different. So could you please open a new bug?

Be sure to include details of each region and rack controller on your system, which subnets are attached to each, if DHCP relay is in use or not, and which subnet the PXE boot occurs on.

Thanks in advance.

Andres Rodriguez (andreserl) on 2017-03-15

Changed in maas:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.