MAAS showing incorrect IP for allocated machine

Bug #1375942 reported by Andreas Hasenack on 2014-09-30
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
High
Julian Edwards

Bug Description

We start up a node in maas, and maas shows it has 10.96.2.10 and 10.96.2.11 addresses (see attached screenshot_006.png file). But these addresses are incorrect:
Sep 30 18:16:37 atlas dhcpd: DHCPDISCOVER from 2c:59:e5:4a:ed:90 via eth0
Sep 30 18:16:37 atlas dhcpd: DHCPOFFER on 10.96.2.146 to 2c:59:e5:4a:ed:90 via eth0
Sep 30 18:16:39 atlas dhcpd: DHCPREQUEST for 10.96.2.146 (10.96.0.10) from 2c:59:e5:4a:ed:90 via eth0
Sep 30 18:16:39 atlas dhcpd: DHCPACK on 10.96.2.146 to 2c:59:e5:4a:ed:90 via eth0
Sep 30 18:17:56 atlas dhcpd: DHCPDISCOVER from 2c:59:e5:4a:ed:90 via eth0
Sep 30 18:17:56 atlas dhcpd: DHCPOFFER on 10.96.2.146 to 2c:59:e5:4a:ed:90 via eth0
Sep 30 18:17:56 atlas dhcpd: DHCPREQUEST for 10.96.2.146 (10.96.0.10) from 2c:59:e5:4a:ed:90 via eth0
Sep 30 18:17:56 atlas dhcpd: DHCPACK on 10.96.2.146 to 2c:59:e5:4a:ed:90 via eth0

The real address that eth0 got is 10.96.2.146. All these addresses come from the static pool.

When we release this node back into maas, and it's in the "Releasing" state, curiously at that moment it shows the node as having the .146 address (see attachment screenshot_007). Same when it's "Ready" again (attachment screenshot_008).

This was with maas 1.7.0~beta4+bzr3095-0ubuntu1.

Related branches

Andreas Hasenack (ahasenack) wrote :
Andreas Hasenack (ahasenack) wrote :

"Releasing"

Andreas Hasenack (ahasenack) wrote :

"Ready"

tags: added: landscape
Andreas Hasenack (ahasenack) wrote :

leases file

Andreas Hasenack (ahasenack) wrote :

Some random selects:

maasdb=# select * from maasserver_staticipaddress;
 id | created | updated | ip | alloc_type | user_id
-----+-------------------------------+-------------------------------+-------------+------------+---------
 600 | 2014-09-23 21:26:36.602443+00 | 2014-09-23 21:26:36.602443+00 | 10.96.2.149 | 0 |
 601 | 2014-09-23 21:26:36.602443+00 | 2014-09-23 21:26:36.602443+00 | 10.96.2.146 | 0 |
(2 rows)

others in attachments

Andreas Hasenack (ahasenack) wrote :
Andreas Hasenack (ahasenack) wrote :
tags: added: cloud-installer
Christian Reis (kiko) on 2014-10-02
Changed in maas:
milestone: none → 1.7.0
importance: Undecided → Critical
Christian Reis (kiko) on 2014-10-02
Changed in maas:
assignee: nobody → Raphaël Badin (rvb)
Raphaël Badin (rvb) wrote :

Did the node also have the wrong IPs (10.96.2.10 and 10.96.2.11 — which I assume are from the dynamic pool) when it was 'Deployed'?

Raphaël Badin (rvb) wrote :

When the node isn't 'deployed', the IPs displayed is whatever is in the DHCP leases file. Since this is only parsed every so often (once per minute), the static IPs might stick around until the leases file is parsed again. That would explain part of this bug.

The only real guarantee with IPs is with static IPs, once the node is deployed.

Changed in maas:
assignee: Raphaël Badin (rvb) → nobody
status: New → Triaged
Andreas Hasenack (ahasenack) wrote :

My static pool range is 10.96.2.10-254, so the IPs mentioned in this bug are all from the static pool. The dynamic pool is 10.96.1.10-254.

Andreas Hasenack (ahasenack) wrote :

While in deployed state, maas said the node had 192.168.2.10 and .11, but in reality it had .146. Juju was trying to contact it via .10 and .11 and would never succeed. When I released the node (or when juju gave up), then it would show up with the .146 address in the maas node view page (!). Most weird.

Andreas Hasenack (ahasenack) wrote :

Argh, forgive my typing. Above I meant 10.96.0.10 and .11, not 192.168.2.10 and .11.

Christian Reis (kiko) on 2014-10-08
Changed in maas:
importance: Critical → High
Christian Reis (kiko) wrote :

Downgrading to High and pushing off given this is purely a visibility issue. Julian has a suggestion on how to fix this properly.

Changed in maas:
milestone: 1.7.0 → next

Here's what's happening:

- Static IPs are not allocated until the node is *started*
- what's happening here is that the *previous* startup's leases are being shown for the old static IPs
- as soon as the node starts up, all will be well
- as soon as the old lease expires, all will be well

However we can do better and force-expire the lease when releasing the node from the user (use omshell to edit the lease object and set its expiry to the epoch)

Changed in maas:
assignee: nobody → Julian Edwards (julian-edwards)
status: Triaged → In Progress
Andres Rodriguez (andreserl) wrote :

We should consider backporting this to 1.7. While this might be a visibility issue, this leads users to believe that nodes have IP addresses that they don't really own.

On Wednesday 22 Oct 2014 02:10:57 you wrote:
> We should consider backporting this to 1.7. While this might be a
> visibility issue, this leads users to believe that nodes have IP
> addresses that they don't really own.

Agreed.

I'm looking at this in more detail again and this is a serious bug, not just a presentation bug, so I'm upgrading this to 1.7.0. See below for explanation:

> While in deployed state, maas said the node had .2.10 and 2.11
According to your screenshots, no it didn't, it was "deploying" and had not acquired new IPs yet. .10 and .11 seem to be old ones hanging around from some other time. This is a clue as to what's going on as they are not in the current leases file.

I have a new theory, I think my old one was bad, because static ranges don't get lease objects generated at all, so won't show up in the leases table (look at the leases file, all the lease objects are in the 10.96.1.N range)

 1. 2.10 and 2.11 addresses, I suspect from an old deployment that had gone wrong had left them allocated in the database.

 2. interfaces were reset to .146 and .149 from the static range when deployed, as expected

 3. node is released, however your host maps didn't get removed from the dhcpd so they are still showing.

I need to work out why the host maps didn't get removed. Is this still happening for you? I'd at least expect to see some entries like this in the leases file:

host 10.96.2.146 {
  dynamic;
  deleted;
}

If they are not there, something has gone badly wrong. Can you attach all your logs and tell me exactly what you're doing if

Also, are those maasserver_staticipaddress entries there *after* you released the node? (ie it's in the READY state)

Changed in maas:
milestone: next → 1.7.0
status: In Progress → Incomplete
Andreas Hasenack (ahasenack) wrote :

All the information I have is attached to this bug. Since it was filed, this maas server has been upgraded, purged, installed and reinstalled multiple times. There is no way we could have left it in this state for so long just for debugging purposes, sorry.

I can't tell anymore when the DB SELECT was done. Someone in #maas asked me to attach them to the bug because they could be interesting.

Is this still happening in your 1.7.0 installation?

Andreas Hasenack (ahasenack) wrote :

No, it didn't happen again.
On Oct 22, 2014 8:15 PM, "Julian Edwards" <email address hidden>
wrote:

> Is this still happening in your 1.7.0 installation?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1375942
>
> Title:
> MAAS showing incorrect IP for allocated machine
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1375942/+subscriptions
>

Ok thanks. Given that info and my research above, I'm going to mark this invalid as I cannot recreate it either. Please re-open if you see this happening again.

(I suspect it may have been a transient bug during the early beta phase when we were migrating away from Celery.)

Changed in maas:
status: Incomplete → Invalid
Changed in maas:
status: Invalid → Fix Committed
Changed in maas:
status: Fix Committed → Invalid
Andreas Hasenack (ahasenack) wrote :

This happened again on a maas I upgraded from 1.6 to 1.7b8. A node that was already enlisted got an IP from the dynamic range when allocated. When I took it down, it showed as having an IP from the static range. Then I brought it up again, and it got the IP from the static range and all was fine.

On Friday 24 Oct 2014 20:47:46 you wrote:
> This happened again on a maas I upgraded from 1.6 to 1.7b8. A node that
> was already enlisted got an IP from the dynamic range when allocated.
> When I took it down, it showed as having an IP from the static range.
> Then I brought it up again, and it got the IP from the static range and
> all was fine.

This is somewhat expected behaviour. The dynamic IP is allocated when MAAS
doesn't know on which cluster interface the node is connected.

The first time it got booted, MAAS worked that out and the second boot
proceeded with the static IP from the range on the correct interface.

I don't know why the node wouldn't have a cluster interface link at that point
but I suspect it was enlisted before a fix went in sometime around 1.6 for
this.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers