juju 1.25.1: lxc units all have the same IP address - changed to claim_sticky_ip_address

Bug #1519527 reported by Ryan Beisner on 2015-11-24
26
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Critical
onecoin
1.9
Critical
onecoin
Trunk
Critical
onecoin

Bug Description

With MAAS 1.9rc2 + proposed Juju 1.25.1, lxc units all possess the same IP address:
http://paste.ubuntu.com/13499208/.

With MAAS 1.9rc2 + stable Juju 1.25.0, lxc units get unique IP addresses as expected:
http://paste.ubuntu.com/13500012/.

This can be reproduced with any workload deployed via MAAS 1.9RC2. The issue is not specific to OpenStack.

Originally observed as:
I've run 5 bare metal deploy tests with Juju proposed 1.25.1, and all 5 have had one or more lxc units go into a "workload-state: error" + "agent-state: lost" condition. The same bundle has a passing test history with Juju 1.25.0.

Lab is MAAS 1.9RC2 ("dellstack").

Related branches

Cheryl Jennings (cherylj) wrote :

Waiting on logs to examine.

Changed in juju-core:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 1.25.2
Ryan Beisner (1chb1n) wrote :

All lxc units possess the same IP address, which is observable both from lxc-ls, and juju public IP. That'll be a problem.

# lxc-ls -f on units 0 through 6:
http://paste.ubuntu.com/13499229/

# juju stat, same deployed enviro as originally filed, after waiting several hrs
http://paste.ubuntu.com/13499208/

Ryan Beisner (1chb1n) wrote :

# `ip a` on units 0 through 6:
http://paste.ubuntu.com/13499353/

Ryan Beisner (1chb1n) on 2015-11-25
description: updated
Ryan Beisner (1chb1n) on 2015-11-25
description: updated
summary: - 1.25.1 as proposed: 1 or more lxc units lose agent state
+ 1.25.1 proposed: lxc units all have the same IP address

unit 0 & unit 2 logs attached as sample points.

Ryan Beisner (1chb1n) on 2015-11-25
description: updated
Dimiter Naydenov (dimitern) wrote :

Can you please attach the contents of /var/lib/juju/containers/ from machine 0 as well?
Also, try deploying a couple of units inside KVMs, rather than LXC and see if there's any difference?

Dimiter Naydenov (dimitern) wrote :

Looking at the logs it seems juju is generating unique MAC addresses for each container before registering them as devices in MAAS. Could it be an issue with MAAS postgres isolation level or something related to concurrently asking for an IP via DHCP and getting the same IP back?

Another useful log to have is the output of "maas <juju-user-profile> devices list" to see what devices juju created and what IPs MAAS knows for each device's MAC.

Gema Gomez (gema) on 2015-11-25
tags: added: sts
Andreas Hasenack (ahasenack) wrote :

FTR, it's working fine with maas 1.8.3+bzr4053-0ubuntu1~trusty1 and juju 1.25.1

Andreas Hasenack (ahasenack) wrote :

Container IPs being allocated just fine and in consecutive order

# grep Sticky /var/log/syslog
Nov 25 08:07:16 virtue maas.api: [INFO] juju-machine-0-lxc-0: Sticky IP address(es) allocated: 10.1.102.111
Nov 25 08:24:07 virtue maas.api: [INFO] juju-machine-0-lxc-0: Sticky IP address(es) allocated: 10.1.102.112
Nov 25 08:29:47 virtue maas.api: [INFO] juju-machine-0-lxc-1: Sticky IP address(es) allocated: 10.1.102.113
Nov 25 08:31:12 virtue maas.api: [INFO] juju-machine-0-lxc-2: Sticky IP address(es) allocated: 10.1.102.114
Nov 25 08:39:39 virtue maas.api: [INFO] juju-machine-1-lxc-0: Sticky IP address(es) allocated: 10.1.102.116

Andrew McDermott (frobware) wrote :

@beisner on IRC you mentioned:

<beisner> ok cool. i've got ~28MB compressed from this deploy. i'm going to tear down the deployment though and do a non-openstack vanilla reproducer bundle with a nothing-charm.

Please could you add the repro steps to the bug. Thanks.

Andreas Hasenack (ahasenack) wrote :

Confirmed busted.

juju 1.25.1 + maas 1.9 and all containers get the same IP. The device feature of maas is not used:

juju deploy ubuntu --to lxc:0 yielded this in the maas server syslog file:
Nov 25 17:49:37 maaslds dhcpd: DHCPDISCOVER from 00:16:3e:a9:76:e4 via eth0
Nov 25 17:49:37 maaslds dhcpd: DHCPOFFER on 10.245.202.8 to 00:16:3e:a9:76:e4 via eth0
Nov 25 17:49:37 maaslds dhcpd: DHCPREQUEST for 10.245.202.8 (10.245.200.27) from 00:16:3e:a9:76:e4 via eth0
Nov 25 17:49:37 maaslds dhcpd: DHCPACK on 10.245.202.8 to 00:16:3e:a9:76:e4 via eth0
Nov 25 17:50:10 maaslds dhcpd: DHCPREQUEST for 10.245.202.8 from 00:16:3e:a9:76:e4 via eth0
Nov 25 17:50:10 maaslds dhcpd: DHCPACK on 10.245.202.8 to 00:16:3e:a9:76:e4 via eth0
Nov 25 17:55:41 maaslds dhcpd: DHCPDISCOVER from 00:50:56:98:10:1c via eth0
Nov 25 17:55:42 maaslds dhcpd: DHCPOFFER on 10.245.201.33 to 00:50:56:98:10:1c via eth0

No record of a device allocation. And if I add-unit --to lxc:0, it gets the same IP:
Nov 25 17:56:32 maaslds dhcpd: DHCPREQUEST for 10.245.202.8 from 00:16:3e:cf:46:20 via eth0
Nov 25 17:56:32 maaslds dhcpd: DHCPACK on 10.245.202.8 to 00:16:3e:cf:46:20 via eth0
Nov 25 17:56:41 maaslds maas.lease_upload_service: [INFO] Uploading 54 DHCP leases to region controller.

$ juju status --format=tabular
[Services]
NAME STATUS EXPOSED CHARM
ubuntu unknown false cs:trusty/ubuntu-4

[Units]
ID WORKLOAD-STATE AGENT-STATE VERSION MACHINE PORTS PUBLIC-ADDRESS MESSAGE
ubuntu/0 unknown lost 1.25.1 0/lxc/0 10.245.202.8 agent is lost, sorry! See 'juju status-history ubuntu/0'
ubuntu/1 unknown lost 1.25.1 0/lxc/1 10.245.202.8 agent is lost, sorry! See 'juju status-history ubuntu/1'

[Machines]
ID STATE VERSION DNS INS-ID SERIES HARDWARE
0 started 1.25.1 node-7.vmwarestack /MAAS/api/1.0/nodes/node-5af81cf4-9399-11e5-a306-00505698101c/ trusty arch=amd64 cpu-cores=2 mem=8192M

Steps to reproduce:
1. Create a juju env using MAAS 1.9 as the provider with at least one registered and "Ready" node
2. juju bootstrap
3. juju deploy ubuntu --to lxc:0
4. juju add-unit ubuntu --to lxc:0

On step 3 already you will see that it's not reserving an IP via the API.
When using juju 1.25.1 and maas 1.8, the same operation yields entries like these:
Nov 25 08:39:38 virtue maas.api: [INFO] juju-machine-1-lxc-0: Added new device
Nov 25 08:39:39 virtue maas.api: [INFO] juju-machine-1-lxc-0: Sticky IP address(es) allocated: 10.1.102.116

tags: added: kanban-cross-team
Andreas Hasenack (ahasenack) wrote :

output of devices list

Andreas Hasenack (ahasenack) wrote :

/var/lib/juju/containers/ from machine 0

Andreas Hasenack (ahasenack) wrote :

lxc-ls -f on machine 0

Andreas Hasenack (ahasenack) wrote :

juju status --format=tabular

Andreas Hasenack (ahasenack) wrote :

maas syslog

Andreas Hasenack (ahasenack) wrote :

ip a on 0-lxc-0

Andreas Hasenack (ahasenack) wrote :

ip a on 0-lxc-1

Andreas Hasenack (ahasenack) wrote :

ip a on bootstrap

Andreas Hasenack (ahasenack) wrote :

This is a better maas log. Shows that devices were allocated, confirmed by the devices list output.

tags: removed: kanban-cross-team
Dimiter Naydenov (dimitern) wrote :

It looks like a MAAS issue - juju creates a device with a known MAC successfully, then tries calling claim_sticky_ip_address for the device, which does not fail, but also does not do what was expected - the device getting an IP.

Andreas Hasenack (ahasenack) wrote :

In the command line it fails:
$ maas andreas-vmwarestack device claim-sticky-ip-address node-93aa1ff4-939c-11e5-b4f2-00505698101c
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <title>Error: Conflict Error</title>
  </head>
  <body>
    <h2>
      Conflict error. Try your request again, as it will most likely succeed.
    </h2>
  </body>
</html>

andreas@nsn7:~$ echo $?
2

maas logs also show the 409:
2015-11-25 19:09:39 [-] 127.0.0.1 - - [25/Nov/2015:19:09:38 +0000] "POST /MAAS/api/1.0/devices/node-93aa1ff4-939c-11e5-b4f2-00505698101c/?op=claim_sticky_ip_address HTTP/1.1" 409 374 "-" "Python-httplib2/0.9 (gzip)"

Juju is indeed making these calls, and maas is logging a failure when doing the claim one:
Nov 25 17:47:20 maaslds maas.api: [INFO] juju-machine-0-lxc-0: Added new device
...
2015-11-25 17:47:20 [-] 127.0.0.1 - - [25/Nov/2015:17:47:19 +0000] "POST /MAAS/api/1.0/devices/?op=new HTTP/1.1" 200 302 "-" "Go 1.1 package http"
2015-11-25 17:47:37 [maasserver.utils.views] ERROR: Attempt #10 for /MAAS/api/1.0/devices/node-93aa1ff4-939c-11e5-b4f2-00505698101c/ failed; giving up (16.6s elapsed in t
otal)
2015-11-25 17:47:37 [-] 127.0.0.1 - - [25/Nov/2015:17:47:36 +0000] "POST /MAAS/api/1.0/devices/node-93aa1ff4-939c-11e5-b4f2-00505698101c/?op=claim_sticky_ip_address HTTP/
1.1" 409 374 "-" "Go 1.1 package http"

There is also a backtrace after that for which I filed another bug because I first saw it when trying to view the node in the MAAS UI: bug #1519918
2015-11-25 17:47:43 [-] 127.0.0.1 - - [25/Nov/2015:17:47:42 +0000] "GET /MAAS/api/1.0/nodes/?agent_name=008a8450-db9d-42a5-8e53-9e37c11c6ee5&id=node-5af81cf4-9399-11e5-a3
06-00505698101c&op=list HTTP/1.1" 200 1732 "-" "Go 1.1 package http"
2015-11-25 17:47:43 [maas.websocket.listener] Unhandled Error
        Traceback (most recent call last):
...
          File "/usr/lib/python2.7/dist-packages/provisioningserver/utils/network.py", line 292, in make_ipaddress
            return IPAddress(input)
          File "/usr/lib/python2.7/dist-packages/netaddr/ip/__init__.py", line 315, in __init__
            'address from %r' % addr)
        netaddr.core.AddrFormatError: failed to detect a valid IP address from u'10.245.200.27,10.245.200.1'

Dimiter Naydenov (dimitern) wrote :

Well, I haven't seen claim-sticky-ip-address to fail with 409 in my tests, unless I explicitly call it requested_address set to an address allocated to another node.

I've discovered a very similar MAAS issue (see bug #1514486) where containers get the same IP (FWIW the first IP from the subent's static range as it was unused), but claim-sticky-ip-address does not fail. It just takes 5-10 seconds to respond, and there were similar MAAS log entries.

Anyway, maybe since that last bug was fixed, claim-sticky-ip-address started returning 409 - I'll verify tomorrow.

https://github.com/juju/juju/pull/3730
This PR introduced the feature that exposed the current issue on MAAS 1.9b2+ with Jujun 1.25.1+

Mike Pontillo (mpontillo) wrote :

One issue we were seeing here with regard to MAAS was, when errors would occur in the middle of an API call, they would silently fail, and we wouldn't know why.

I worked with Andreas to triage this by having him apply this patch (via "cat patch | sudo patch -p2 -d /usr/lib/python2.7/dist-packages/maasserver"):

http://paste.ubuntu.com/13505469/

(as an aside, I want to land that as well - for support reasons, until we have proper observability via the API.)

We then saw this in the log:

maas.exception: [ERROR] The IP address 10.245.202.8 is already in use.

This points to _attempt_allocation() in staticipaddress.py. When this occurs, the expected behavior is that the transaction will be marked as a serialization failure, and retried (for example, if two containers are trying to reserve the same IP address in parallel, you might expect that to happen). It's a bit of a mystery why (in this setup) the serialization failures are reproducible, even when you run claim_sticky_ip_address from the command line.

Mike Pontillo (mpontillo) wrote :

This one is puzzling, because it seems that we get an error saying that the address already exists, *just* after we check to be sure it doesn't exist yet. *And*, according to the database, it in fact *doesn't* exist.

At this point I'm suspecting that there is some flaw in the way we're setting up the transactions in the database...

no longer affects: juju-core
summary: - 1.25.1 proposed: lxc units all have the same IP address
+ MAAS 1.9b2+ with juju 1.25.1: lxc units all have the same IP address

Dimiter, could you just please make sure that juju is catching that 409
when it tries the maas claim call? The maas logs show that response, so
juju must be getting it.
On Nov 26, 2015 06:05, "Dimiter Naydenov" <email address hidden>
wrote:

> ** No longer affects: juju-core
>
> ** Summary changed:
>
> - 1.25.1 proposed: lxc units all have the same IP address
> + MAAS 1.9b2+ with juju 1.25.1: lxc units all have the same IP address
>
> --
> You received this bug notification because you are a member of
> Landscape, which is subscribed to the bug report.
> https://bugs.launchpad.net/bugs/1519527
>
> Title:
> MAAS 1.9b2+ with juju 1.25.1: lxc units all have the same IP address
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1519527/+subscriptions
>

Andreas, Juju is catching any errors from claim-sticky-ip-address. At TRACE level each relevant MAAS API call response is logged regardless of outcome. On a successful claim-sticky-ip-address response, there should be a INFO level log like this:

INFO juju.provider.maas environ.go:1564 reserved sticky IP address for device "node-dfa45ca0-92f5-11e5-8cca-d4bed9a84493" representing container "juju-machine-0-lxc-0"

That should appear shortly after the device creation, logged as:

INFO juju.provider.maas environ.go:1543 created device "node-dfa45ca0-92f5-11e5-8cca-d4bed9a84493" for container "juju-machine-0-lxc-0" with MAC address "00:16:3e:fb:06:d7" on parent node "/MAAS/api/1.0/nodes/node-d4692494-8228-11e4-8078-d4bed9a84493/"

I can't see the "reserved sticky.." log anywhere, which is weird, so I investigated some more.

Looking at how the code is implemented, I can now confirm errors from claim-sticky-ip-address are effectively ignored further up the stack. I've filed another bug #1520199 about this, as Juju can do better - in fact the the absence of an IP address in the response should be caught and reported as error. But, that won't solve the current issue with MAAS misbehaving.

On Thu, Nov 26, 2015 at 10:17 AM, Dimiter Naydenov <
<email address hidden>> wrote:

> Andreas, Juju is catching any errors from claim-sticky-ip-address. At
> TRACE level each relevant MAAS API call response is logged regardless of
> outcome. On a successful claim-sticky-ip-address response, there should
> be a INFO level log like this:
>
> INFO juju.provider.maas environ.go:1564 reserved sticky IP address for
> device "node-dfa45ca0-92f5-11e5-8cca-d4bed9a84493" representing
> container "juju-machine-0-lxc-0"
>
> That should appear shortly after the device creation, logged as:
>
> INFO juju.provider.maas environ.go:1543 created device "node-
> dfa45ca0-92f5-11e5-8cca-d4bed9a84493" for container "juju-
> machine-0-lxc-0" with MAC address "00:16:3e:fb:06:d7" on parent node
> "/MAAS/api/1.0/nodes/node-d4692494-8228-11e4-8078-d4bed9a84493/"
>
> I can't see the "reserved sticky.." log anywhere, which is weird, so I
> investigated some more.
>
> Looking at how the code is implemented, I can now confirm errors from
> claim-sticky-ip-address are effectively ignored further up the stack.
> I've filed another bug #1520199 about this, as Juju can do better - in
> fact the the absence of an IP address in the response should be caught
> and reported as error. But, that won't solve the current issue with
> MAAS misbehaving.
>

Thanks. At least maas told juju it was misbehaving ;)

Given that with Juju 1.25.0 + MAAS 1.9b2, the container IPs were sane and unique, I think Juju should still track and block on this bug as a regression with shared interest in releasing a functional MAAS 1.9.x + JUJU 1.25.x combo.

Whether that means waiting for MAAS to be fix-released on a new beta, or deferring 1.25.1, I think we should not release 1.25.1 until a confirmed combo is ready and consumable via ppa.

I know that if 1.25.1 releases alone as scheduled, it *WILL* break my test automation for charm testing on bare metal, as we have already upgraded to MAAS 1.9b2 so that we can exercise network spaces features. I would be willing to bet that others will be similarly blocked.

I've re-added Juju to the affected list. Thank you all.

Andres Rodriguez (andreserl) wrote :

Hi Ryan,

I have a quick question. You said this:

With the proposed Juju 1.25.1, lxc units all possess the same IP address:
http://paste.ubuntu.com/13499208/.

With stable Juju 1.25.0, lxc units get unique IP addresses as expected:
http://paste.ubuntu.com/13500012/.

Can you still reproduce against 1.9b2 ?

Andres Rodriguez (andreserl) wrote :

Provided that Ryan has confirmed and verified that when running Juju 1.25.0 against MAAS 1.9rc2 everything works as expected, but when running 1.25.1 against the same MAAS 1.9rc2, it does not work as expected, this seems like an issue in Juju rather than MAAS. I'm updating the bug accordingly.

summary: - MAAS 1.9b2+ with juju 1.25.1: lxc units all have the same IP address
+ juju 1.25.1: lxc units all have the same IP address
summary: - juju 1.25.1: lxc units all have the same IP address
+ juju 1.25.1: lxc units all have the same IP address - changes to
+ claim_sticky_ip_address
summary: - juju 1.25.1: lxc units all have the same IP address - changes to
+ juju 1.25.1: lxc units all have the same IP address - changed to
claim_sticky_ip_address
Dimiter Naydenov (dimitern) wrote :

Andres, the issue with b2 is "maas <profile> device claim-sticky-ip-address <device-id>" (which juju 1.25.1 uses, after creating a device for the container with parent it's host node) fails to allocate an address for the device, despite having properly configured ranges and available IPs.

Cheryl Jennings (cherylj) wrote :

I agree that we cannot move juju 1.25.1 out of proposed until the issue is resolved. Discussions between dimitern and mpontillo indicate that this is a maas 1.9beta2+ regression which changes in juju 1.25.1 exposed.

Ryan Beisner (1chb1n) on 2015-11-30
description: updated
Ryan Beisner (1chb1n) wrote :

Here is a non-OpenStack generic reproducer to re-confirm.

Generic reproducer bundle:
http://paste.ubuntu.com/13576737/

PASS: MAAS 1.9b2 + Juju 1.25.0
lxc units get unique IPs: http://paste.ubuntu.com/13576758/

FAIL: MAAS 1.9b2 + Juju 1.25.1
lxc units all have the same IP: http://paste.ubuntu.com/13578689/

Reproduction steps:
Deploy the bundle, once with [MAAS 1.9b2 + Juju 1.25.0] and once with [MAAS 1.9b2 + Juju 1.25.1]:
`juju bootstrap && juju-deployer -v -c ubuntu18lxc.yaml -d vivid`

Andreas Hasenack (ahasenack) wrote :

> I agree that we cannot move juju 1.25.1 out of proposed until the issue is resolved.
> Discussions between dimitern and mpontillo indicate that this is a maas 1.9beta2+
> regression which changes in juju 1.25.1 exposed.

It also exposed a bug in juju-core (it ignores the fact that the claim-sticky-ip-address call actually failed): https://bugs.launchpad.net/juju-core/+bug/1520199

Andres Rodriguez (andreserl) wrote :

Are you guys using B2 or RC2 ?

Ryan Beisner (1chb1n) wrote :

@andreserl

RC2

Please disregard my mentions of B2. Indeed I am using RC2.

description: updated
Mike Pontillo (mpontillo) wrote :

I can confirm that I have replicated the issue in my local MAAS test bed.

I'm seeing behavior such as this:

$ maas admin device claim-sticky-ip-address node-ad58adcc-980a-11e5-a692-525400130e6f
Success.
Machine-readable output follows:
{
    "macaddress_set": [
        {
            "mac_address": "01:02:03:04:05:07"
        }
    ],
    "zone": {
        "resource_uri": "/MAAS/api/1.0/zones/default/",
        "name": "default",
        "description": ""
    },
    "parent": "node-0f48bcc8-9263-11e5-9fc7-525400130e6f",
    "ip_addresses": [],
    "hostname": "cheap-coast.maas",
    "system_id": "node-ad58adcc-980a-11e5-a692-525400130e6f",
    "owner": "root",
    "tag_names": [],
    "resource_uri": "/MAAS/api/1.0/devices/node-ad58adcc-980a-11e5-a692-525400130e6f/"
}

I would expect to see some ip_addresses here, but we've got nothing.

I'm still working on the root cause.

Mike Pontillo (mpontillo) wrote :

Getting closer:

http://paste.ubuntu.com/13595538/

I still don't know exactly why this is happening, but I think I can get it fixed within a day.

Dimiter Naydenov (dimitern) wrote :

Related bug #1520199 fixed in 1.25 and master.

no longer affects: juju-core
Changed in juju-core (Ubuntu):
status: New → Invalid
affects: juju-core (Ubuntu) → ubuntu
affects: ubuntu → juju-core
no longer affects: juju-core
Andres Rodriguez (andreserl) wrote :

Ok,

After doing some manual testing yesterday, I've been unable to reproduce this bug.

This is what I see in the MAAS logging:

Dec 3 06:56:15 trusty-maas9 maas.api: [INFO] juju-machine-1-lxc-0: Added new device
Dec 3 06:56:15 trusty-maas9 maas.api: [INFO] juju-machine-1-lxc-0: Sticky IP address(es) allocated: 192.168.10.104
Dec 3 06:56:15 trusty-maas9 maas.dns: [INFO] Generating new DNS zone file for maas
Dec 3 06:56:15 trusty-maas9 maas.dns: [INFO] Generating new DNS zone file for 10.168.192.in-addr.arpa
Dec 3 06:57:06 trusty-maas9 maas.import-images: [INFO] Started importing boot images.
Dec 3 06:57:06 trusty-maas9 maas.import-images: [INFO] Finished importing boot images, the region does not have any new images.
Dec 3 06:57:11 trusty-maas9 maas.lease_upload_service: [INFO] Uploading 48 DHCP leases to region controller.
Dec 3 06:59:59 trusty-maas9 maas.api: [INFO] juju-machine-1-lxc-1: Added new device
Dec 3 06:59:59 trusty-maas9 maas.api: [INFO] juju-machine-1-lxc-1: Sticky IP address(es) allocated: 192.168.10.105
Dec 3 06:59:59 trusty-maas9 maas.dns: [INFO] Generating new DNS zone file for maas
Dec 3 06:59:59 trusty-maas9 maas.dns: [INFO] Generating new DNS zone file for 10.168.192.in-addr.arpa
Dec 3 07:00:08 trusty-maas9 maas.api: [INFO] juju-machine-1-lxc-2: Added new device
Dec 3 07:00:08 trusty-maas9 maas.api: [INFO] juju-machine-1-lxc-2: Sticky IP address(es) allocated: 192.168.10.106
Dec 3 07:00:09 trusty-maas9 maas.dns: [INFO] Generating new DNS zone file for maas
Dec 3 07:00:09 trusty-maas9 maas.dns: [INFO] Generating new DNS zone file for 10.168.192.in-addr.arpa

I've verified that the containers get the right IP address and DNS record, and everything works as expected. That being said, this was a 1.9 freshest install and it wasn't an upgrade from 1.8.

We are going to try to reproduce this tomorrow, but in the meantime, I'll mark this bug as incomplete until Ryan can setup his environment all over again and we can work though this.

Thanks.

Andres Rodriguez (andreserl) wrote :

One more thing. Looking at Andreas log, what might be the issue is that there are no more available IP addresses on the static range to allocate to these machines:

netaddr.core.AddrFormatError: failed to detect a valid IP address from u'10.245.200.27,10.245.200.1

We'll try to debug this tomorrow.

Mike Pontillo (mpontillo) wrote :

The AddrFormatError is bug 1519918.

I have been testing this extensively as well, and have not been able to reproduce this exact issue using juju 1.25.1.

We have a few more theories about what might be happening:

 - This behavior is a symptom of the problems addressed as of my yet-to-be-landed branch here:
    https://code.launchpad.net/~mpontillo/maas/ip-allocation-bugs-1.9/+merge/278925

 - This behavior occurs because a static range is not defined for the cluster interface associated with the subnet the interface is on. (after the aforementioned branch lands, this will be an error message in the log, and a failed deployment.)

 - This behavior occurs because of a migration from a previous version of MAAS (perhaps a migration that has been fixed by a more recent release candidate, and thus will not run again)

 - This behavior occurs because the cluster interface did not have a subnet mask defined (either because it is unmanaged, or because of a misbehaving database migration). The aforementioned branch also fixes some of these corner cases, and we will now find cluster interfaces even though they are disassociated from their subnets (as a fallback, just in case).

 - This behavior occurs because postgresql settings were changed on the system to affect the database isolation level, thus causing our transactional logic (which prevents duplicate IP addresses from being assigned) to fail, and allowing duplicate IP addresses in the database (when our code, combined with the correct database isolation level, would prevent it).

As an aside, with MAAS 1.8.3 we found a separate issue: deploying LXCs with juju, followed by destroying the environment, leads to the STICKY (static) IP addresses associated with the LXC MAC addresses to remain in the database. It's possible that this could lead to IP address exhaustion in a situation like this. If this is happening, and a large number of "Sticky" IP addresses appear in the subnet details page (not associated with any node or device) running the following commands will clean it up on MAAS 1.9:

sudo maas-region-admin dbshell
delete from maasserver_staticipaddress ip
    where ip.id in (
        select ip.id from maasserver_staticipaddress id
            left outer join maasserver_interface_ip_addresses iip
                on ip.id = iip.staticipaddress_id
                where iip.id is null and ip.alloc_type=1
    );

Mike Pontillo (mpontillo) wrote :

I'm going to mark this "Incomplete" until we get more data on the cause of the duplicate IP addresses. Please reach out to me on IRC if you have an environment where you can reproduce this.

Again, it's possible that the branch I'm working on will fix this, so it would also be good to apply the changes from that branch to see if the outcome changes.

Failing that, I'd like to see if this bug can be seen on a completely fresh Ubuntu + MAAS install, to eliminate the variables of pre-release migrations and long-forgotten postgresql configuration tweaks affecting the outcode. Finally, we will need the MAAS logs if this issue can be reproduced. (I have not yet seen any MAAS logs which show us allocating the duplicate IP address to a different node, which makes me wonder if the IP addresses were obtained by DHCP - possibly with a short lease time.)

Mike Pontillo (mpontillo) wrote :

Also (sorry) there is a typo in the SQL commands above. That should be:

sudo maas-region-admin dbshell
delete from maasserver_staticipaddress ip
    where ip.id in (
        select ip.id from maasserver_staticipaddress ip
            left outer join maasserver_interface_ip_addresses iip
                on ip.id = iip.staticipaddress_id
                where iip.id is null and ip.alloc_type=1
    );

I'm on PTO today and tomorrow, and sprint next week. I do have some hours
to kill at the airport, so I might be able to get you this info before next
week.
On Dec 3, 2015 04:05, "Mike Pontillo" <email address hidden> wrote:

> Also (sorry) there is a typo in the SQL commands above. That should be:
>
> sudo maas-region-admin dbshell
> delete from maasserver_staticipaddress ip
> where ip.id in (
> select ip.id from maasserver_staticipaddress ip
> left outer join maasserver_interface_ip_addresses iip
> on ip.id = iip.staticipaddress_id
> where iip.id is null and ip.alloc_type=1
> );
>
> --
> You received this bug notification because you are a member of
> Landscape, which is subscribed to the bug report.
> https://bugs.launchpad.net/bugs/1519527
>
> Title:
> juju 1.25.1: lxc units all have the same IP address - changed to
> claim_sticky_ip_address
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1519527/+subscriptions
>

Blake Rouse (blake-rouse) wrote :

Okay the issue that the parent node can have multiple discovered ip addresses on the same subnet. That cause the code to try and allocation the same IP address twice. I have a fix for this.

Andreas Hasenack (ahasenack) wrote :

Looks like you guys reproduced it, but just to make it clear, here are the steps I took:

- fresh deploy MAAS 1.9.0~rc2+bzr4509-0ubuntu1~trusty1
- register a node with it
- use juju 1.25.1 from proposed, and agent-stream proposed (important)
- juju bootstrap
- juju deploy ubuntu --to lxc:0
- juju add-unit ubuntu --to lxc:0

This is the result:

andreas@nsn7:~$ juju status --format=tabular
[Services]
NAME STATUS EXPOSED CHARM
ubuntu error false cs:trusty/ubuntu-4

[Units]
ID WORKLOAD-STATE AGENT-STATE VERSION MACHINE PORTS PUBLIC-ADDRESS MESSAGE
ubuntu/0 unknown lost 1.25.1 0/lxc/0 10.0.5.152 agent is lost, sorry! See 'juju status-history ubuntu/0'
ubuntu/1 error lost 1.25.1 0/lxc/1 10.0.5.152 hook failed: "leader-settings-changed"

[Machines]
ID STATE VERSION DNS INS-ID SERIES HARDWARE
0 started 1.25.1 ruddy-legs.kvmmaas /MAAS/api/1.0/nodes/node-4ef315b8-9ccb-11e5-8ae1-52540042c56f/ trusty arch=amd64 cpu-cores=1 mem=2048M tags=virtual

Andreas Hasenack (ahasenack) wrote :

Confirmed fixed in 1.9.0~rc3+bzr4525-0ubuntu1~trusty1 for me:

[Units]
ID WORKLOAD-STATE AGENT-STATE VERSION MACHINE PORTS PUBLIC-ADDRESS MESSAGE
ubuntu/0 unknown idle 1.25.1 0/lxc/0 10.0.5.152
ubuntu/1 unknown idle 1.25.1 0/lxc/1 10.0.5.153

Ryan Beisner (1chb1n) wrote :

Also confirming:

lxc units get unique IP addresses with Juju 1.25.1 + MAAS 1.9.0 (rc3+bzr4525).

Thank you all for your work on this!

onecoin (onecoin6016) wrote :

https://login<email address hidden>

onecoin (onecoin6016) wrote :

Hostname IPv4-Address MAC-Address Leasetime remaining

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers