juju2 with maas 2.1.1 LXD containers get wrong ip addresses

Bug #1643057 reported by David Britton
34
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Invalid
Undecided
Unassigned
MAAS
Fix Released
Critical
Mike Pontillo
1.9
Won't Fix
Undecided
Unassigned
2.0
Won't Fix
Undecided
Unassigned
2.1
Fix Released
Critical
Mike Pontillo

Bug Description

Juju containers in a MAAS deployment are not getting externally routed IP addresses, something is wrong in the communication between juju 2.x and maas 2.1.1. This was not observed with maas 2.0.x.

juju status snippit:

Machine State DNS Inst id Series AZ
0 started 10.96.13.224 node-32fdcb12-546c-11e4-b3f2-2c59e54ace74 xenial dawn
0/lxd/0 started 10.0.0.65 juju-3bd45f-0-lxd-0 xenial
0/lxd/1 started 10.0.0.56 juju-3bd45f-0-lxd-1 xenial
0/lxd/2 started 10.0.0.205 juju-3bd45f-0-lxd-2 xenial
0/lxd/3 started 10.0.0.113 juju-3bd45f-0-lxd-3 xenial

Notice the 10.0.0.x addresses are not routable on this 10.96.0.0/17 network.

Versions:
juju: 2.0.0
maas: 2.1.1

The POST seems to get a 500 followed by an attempted removal:

From maas.log:

maas:
2016-11-18 20:17:32 -: [info] ::ffff:127.0.0.1 - - [18/Nov/2016:20:17:31 +0000] "GET /MAAS/api/2.0/machines/?agent_name=3239b94b-eaac-43d0-836d-baf63c3bd45f&id=node-32fdcb12-546c-11e4-b3f2-2c59e54ace74 HTTP/1.1" 200 2174 "-" "Go-http-client/1.1"
2016-11-18 20:17:32 -: [info] ::ffff:127.0.0.1 - - [18/Nov/2016:20:17:32 +0000] "POST /MAAS/api/2.0/devices/?op= HTTP/1.1" 200 504 "-" "Go-http-client/1.1"
2016-11-18 20:17:32 -: [info] ::ffff:127.0.0.1 - - [18/Nov/2016:20:17:32 +0000] "PUT /MAAS/api/2.0/nodes/rta8sc/interfaces/203535/ HTTP/1.1" 200 231 "-" "Go-http-client/1.1"
2016-11-18 20:17:32 maasserver: [error] ################################ Exception: Subnet matching query does not exist. ################################
2016-11-18 20:17:32 maasserver: [error] Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/django/forms/models.py", line 1218, in to_python
    value = self.queryset.get(**{key: value})
  File "/usr/lib/python3/dist-packages/django/db/models/query.py", line 334, in get
    self.model._meta.object_name
maasserver.models.subnet.DoesNotExist: Subnet matching query does not exist.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/maasserver/fields.py", line 655, in to_python
    return super(SpecifierOrModelChoiceField, self).to_python(value)
  File "/usr/lib/python3/dist-packages/django/forms/models.py", line 1220, in to_python
    raise ValidationError(self.error_messages['invalid_choice'], code='invalid_choice')
django.core.exceptions.ValidationError: ['Select a valid choice. That choice is not one of the available choices.']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/django/core/handlers/base.py", line 132, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/lib/python3/dist-packages/maasserver/utils/views.py", line 177, in view_atomic_with_post_commit_savepoint
    return view_atomic(*args, **kwargs)
  File "/usr/lib/python3.5/contextlib.py", line 30, in inner
    return func(*args, **kwds)
  File "/usr/lib/python3/dist-packages/maasserver/api/support.py", line 55, in __call__
    response = upcall(request, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/django/views/decorators/vary.py", line 21, in inner_func
    response = func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/piston3/resource.py", line 190, in __call__
    result = self.error_handler(e, request, meth, em_format)
  File "/usr/lib/python3/dist-packages/piston3/resource.py", line 188, in __call__
    result = meth(request, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/maasserver/api/support.py", line 261, in dispatch
    return function(self, request, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/maasserver/api/interfaces.py", line 595, in link_subnet
    if form.is_valid():
  File "/usr/lib/python3/dist-packages/django/forms/forms.py", line 184, in is_valid
    return self.is_bound and not self.errors
  File "/usr/lib/python3/dist-packages/django/forms/forms.py", line 176, in errors
    self.full_clean()
  File "/usr/lib/python3/dist-packages/django/forms/forms.py", line 392, in full_clean
    self._clean_fields()
  File "/usr/lib/python3/dist-packages/django/forms/forms.py", line 407, in _clean_fields
    value = field.clean(value)
  File "/usr/lib/python3/dist-packages/django/forms/fields.py", line 162, in clean
    value = self.to_python(value)
  File "/usr/lib/python3/dist-packages/maasserver/fields.py", line 665, in to_python
    return self.queryset.get(id=object_id)
  File "/usr/lib/python3/dist-packages/django/db/models/query.py", line 334, in get
    self.model._meta.object_name
maasserver.models.subnet.DoesNotExist: Subnet matching query does not exist.

2016-11-18 20:17:32 -: [info] ::ffff:127.0.0.1 - - [18/Nov/2016:20:17:32 +0000] "POST /MAAS/api/2.0/nodes/rta8sc/interfaces/203535/?op=link_subnet HTTP/1.1" 500 37 "-" "Go-http-client/1.1"
2016-11-18 20:17:33 -: [info] ::ffff:127.0.0.1 - - [18/Nov/2016:20:17:32 +0000] "DELETE /MAAS/api/2.0/devices/rta8sc/ HTTP/1.1" 204 - "-" "Go-http-client/1.1"

From Juju host system:

2016-11-18 20:15:38 DEBUG juju.network network.go:389 addresses after filtering: [local-machine:127.0.0.1 local-cloud:10.96.13.224 local-machine:::1]
2016-11-18 20:15:38 INFO juju.worker.machiner machiner.go:142 setting addresses for machine-0 to ["local-machine:127.0.0.1" "local-cloud:10.96.13.224" "local-machine:::1"]
2016-11-18 20:15:38 DEBUG juju.worker.logger logger.go:50 reconfiguring logging from "<root>=DEBUG" to "<root>=WARNING;unit=DEBUG"
2016-11-18 20:16:22 WARNING juju.provisioner lxd-broker.go:62 failed to prepare container "0/lxd/0" network config: linking device interface "eth0" to subnet "10.96.0.0/17" failed: unexpected: ServerError: 500 INTERNAL SERVER ERROR (Subnet matching query does not exist.)
2016-11-18 20:16:22 WARNING juju.provisioner broker.go:97 incomplete DNS config found, discovering host's DNS config
2016-11-18 20:17:08 WARNING juju.provisioner lxd-broker.go:62 failed to prepare container "0/lxd/1" network config: linking device interface "eth0" to subnet "10.96.0.0/17" failed: unexpected: ServerError: 500 INTERNAL SERVER ERROR (Subnet matching query does not exist.)
2016-11-18 20:17:08 WARNING juju.provisioner broker.go:97 incomplete DNS config found, discovering host's DNS config
2016-11-18 20:17:20 WARNING juju.provisioner lxd-broker.go:62 failed to prepare container "0/lxd/2" network config: linking device interface "eth0" to subnet "10.96.0.0/17" failed: unexpected: ServerError: 500 INTERNAL SERVER ERROR (Subnet matching query does not exist.)
2016-11-18 20:17:20 WARNING juju.provisioner broker.go:97 incomplete DNS config found, discovering host's DNS config
2016-11-18 20:17:33 WARNING juju.provisioner lxd-broker.go:62 failed to prepare container "0/lxd/3" network config: linking device interface "eth0" to subnet "10.96.0.0/17" failed: unexpected: ServerError: 500 INTERNAL SERVER ERROR (Subnet matching query does not exist.)
2016-11-18 20:17:33 WARNING juju.provisioner broker.go:97 incomplete DNS config found, discovering host's DNS config

Related branches

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

subnets read

Revision history for this message
David Britton (dpb) wrote :

maas/*.log from maas server.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

shawmut node, where the containers were deployed to

Revision history for this message
David Britton (dpb) wrote :

juju-status from the busted deployment

tags: removed: kanban-cross-team
Revision history for this message
Mike Pontillo (mpontillo) wrote :

If I remember correctly, the root cause of this (juju specifying the subnet in a way that MAAS did not expect) was fixed by the following pull request:

https://github.com/juju/gomaasapi/pull/62

@juju team, was this fix released? (if so, what version of juju has the fix?)

Revision history for this message
Mike Pontillo (mpontillo) wrote :

Hm, actually, this may be a separate issue. We should triage this next week. Do you have a minimal test case for this that makes it easy to recreate? (in the past, I think we've used the Ubuntu charm to test similar use cases...)

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I repeated a minimal deploy:
juju bootstrap
juju switch controller
juju deploy ubuntu --to lxd:0

This is the POST that juju made to MAAS:

POST /MAAS/api/2.0/nodes/b7c4ry/interfaces/203538/?op=link_subnet HTTP/1.1
Host: 10.96.0.10
User-Agent: Go-http-client/1.1
Connection: close
Content-Length: 21
Authorization: OAuth oauth_nonce="a666ce8fcd13bfab0443b0eb9e8cf50c", oauth_signature_method="PLAINTEXT", oauth_version="1.0", realm="MAAS+API", oauth_consumer_key="e6KsgbnZPkEqQSD3Kp", oauth_token="QLGQwVy758bj3QvRHQ", oauth_signature="%268yA3RtQdGLEGrJqAr3hgTzPdyqQUvwVZ", oauth_timestamp="1479649961"
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip
Connection: close

mode=STATIC&subnet=10HTTP/1.1 500 INTERNAL SERVER ERROR
Date: Sun, 20 Nov 2016 13:52:41 GMT
Server: TwistedWeb/16.0.0
X-Frame-Options: SAMEORIGIN
Vary: Cookie
Content-Type: text/plain; charset=utf-8
Connection: close
Transfer-Encoding: chunked

25
Subnet matching query does not exist.
0

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

This is the relevant capture between the juju controller and the MAAS 2.1.1 server.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Ok, here is a better workflow (also in
- juju creates a new device:
POST /MAAS/api/2.0/devices/?op= HTTP/1.1
hostname=juju-2988b5-0-lxd-2&mac_addresses=00%3A16%3A3e%3Afb%3A53%3Ab3&parent=node-10a0d7d4-39eb-11e5-ab72-2c59e54ace7

The response is:
{
    "system_id": "b7c4ry",
    "fqdn": "juju-2988b5-0-lxd-2.scapestack",
    "address_ttl": null,
    "parent": "node-10a0d7d4-39eb-11e5-ab72-2c59e54ace74",
    "resource_uri": "/MAAS/api/2.0/devices/b7c4ry/",
    "owner": "andreas",
    "ip_addresses": [],
    "interface_set": [
        {
            "name": "eth0",
            "params": "",
            "links": [],
            "enabled": true,
            "id": 203538,
            "effective_mtu": 1500,
            "vlan": null,
            "children": [],
            "tags": [],
            "type": "physical",
            "discovered": null,
            "resource_uri": "/MAAS/api/2.0/nodes/b7c4ry/interfaces/203538/",
            "parents": [],
            "mac_address": "00:16:3e:fb:53:b3"
        }
    ],
    "owner_data": {},
    "node_type_name": "Device",
    "node_type": 1,
    "zone": {
        "name": "default",
        "id": 1,
        "resource_uri": "/MAAS/api/2.0/zones/default/",
        "description": ""
    },
    "tag_names": [],
    "hostname": "juju-2988b5-0-lxd-2",
    "domain": {
        "resource_record_count": 0,
        "ttl": null,
        "authoritative": true,
        "name": "scapestack",
        "id": 1,
        "resource_uri": "/MAAS/api/2.0/domains/1/"
    }
}

Then juju tries to link the new device to the 10.96.0.0/17 (id=10) subnet, and that's what fails:

POST /MAAS/api/2.0/nodes/b7c4ry/interfaces/203538/?op=link_subnet HTTP/1.1
mode=STATIC&subnet=10

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Same as above but in the a pastebin: http://pastebin.ubuntu.com/23511243/

Revision history for this message
Blake Rouse (blake-rouse) wrote :

Okay I think I figured out the issue. When Juju creates the device the interface is no longer placed in the default fabric + vlan. The interface is created in disconnected mode and MAAS believes that it is not connected to any VLAN, in that case no subnets exists that can be linked to that interface.

The fix here is when link_subnet is called and the interface is disconnect, it should change to be connected on the same interface as the subnet that Juju wants to link the interface to.

This is completely a MAAS issue in 2.1 when disconnected interfaces became a thing.

Changed in juju:
status: New → Invalid
Changed in maas:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 2.2.0
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

We have other maas 2.1 servers where this isn't happening, any idea why?

Also, is there something we can do to this particular 2.1 server where it *is* happening, to put it back to work?

Chris Gregan (cgregan)
tags: added: cdo-qa-blocker
Revision history for this message
Blake Rouse (blake-rouse) wrote :

Interesting that you have this happening on another MAAS server where you are not seeing this issue. I would assume this is would result in the same behavior for 2.1+. Can you provide the version differences, or are they exactly the same versions?

Also if you could provide the output of "maas session device read {device_id}" for both servers that would be helpful.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

I think I see the issue. In MAAS 2.1 we no longer put interfaces on a VLAN by default. In the past, this would have worked *if and only if* the requested subnet was in the default fabric on the default VLAN.

An easy way to fix this, I think, is to allow *any* subnet to be selected if the VLAN is undefined. (Right now, I would bet that the form restricts subnet selection to subnets on the VLAN the interface is on.)

Revision history for this message
Mike Pontillo (mpontillo) wrote :

I've got a branch up that I'm pretty sure will fix this, if anyone wants to try patching their MAAS the same way.

https://code.launchpad.net/~mpontillo/maas/link-subnet-no-match--bug-1643057/+merge/311433

Changed in maas:
status: Triaged → In Progress
assignee: nobody → Mike Pontillo (mpontillo)
Revision history for this message
Mike Pontillo (mpontillo) wrote :

I think this bug also affects MAAS 1.9 and MAAS 2.0, just in a subtly different way: if the subnets juju is trying to assign aren't in the default fabric on the default VLAN, I think you'll get the same error.

This is a "Won't Fix" for those versions because the ability for a VLAN to be undefined was introduced in MAAS 2.1, and is risky to port to older versions of MAAS.

Changed in maas:
status: In Progress → Fix Committed
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I applied the diff to our maas server which was exhibiting this problem and it's fixed now.

Changed in maas:
milestone: 2.2.0 → none
status: Fix Committed → Fix Released
Revision history for this message
Mike Pontillo (mpontillo) wrote :

For the record, this was fixed in 2.2.0 and 2.1.2. I'm not sure why the milestones were removed.

Revision history for this message
Xav Paice (xavpaice) wrote :

Maas 2.1.3 and Juju 2.0.2, seeing the same issue. I have, however just upgraded Maas from 2.1.2 and not redeployed any nodes, just fresh units via juju.

tags: added: canonical-bootstack
Revision history for this message
Xav Paice (xavpaice) wrote :

Workaround, in case others need help here: deploy the container, watch it fail to work.

Go to the machine (e.g. juju ssh 10) and:
lxc stop <container>
lxc config edit <container>
-> edit the parent interface
lxc start <container>

Revision history for this message
JuanJo Ciarlante (jjo) wrote :

Seconding Xav - this happened on an install with:
maas 2.1.3+bzr5573, although node was deployed w/2.1.1+bzr5544
juju 2.0.2
lxc 2.0.8

Is there any workaround possible to be done at already deployed
nodes to avoid this issue (without redeploying them) ?

Revision history for this message
Michał Ajduk (majduk) wrote :

I have same problem:
MAAS Version 2.1.4+bzr5591
juju 2.1.2
lxc 2.0.7

Revision history for this message
Mike Pontillo (mpontillo) wrote :

The IP addresses in MAAS are linked up when juju deploys, so if you've already deployed I don't see an obvious workaround.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.