[1.9] Device & deployed machine both have same static IP's

Bug #1581250 reported by Larry Michel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Unassigned
1.9
Won't Fix
Wishlist
Unassigned

Bug Description

While troubleshooting an issue with juju bootstrap where connection to 17070 are periodically being refused, I noticed that I was sshing into the wrong system when trying to access the bootstrap node. It turns out that the same IP address was assigned to 2 systems.

This actually happened another time where I saw myself sshing into the wrong systems, but I was confused as to whether I had the right system in the first.

Here's the scenario for this bug:

1) During deployment, juju deployer lost contact with the bootstrap node, hayward-63 which had been assigned 10.244.192.169 address.
2) To preserve, the server in that state for debugging purposes, modified the power parameter to prevent it from being powered off, and also marked it as broken.
3) Today hayward-63 was powered off and powered back on through BMC.
4) I then tried to ssh into it (hayward-63) but ended up sshing into another server, tucker.
5) I then deleted hayward-63 from maas, and added it as a device after selecting Static so I could boot it from disk and get it back to debugging state. I then noticed that the device was assigned the same address 10.244.192.169, it had been assigned during deployment.

This is the device record from the MAAS UI:
FQDN MAC IP Assignment IP Address Owner
hayward-63.oilstaging 00:22:99:e0:04:67 Static 10.244.192.169 root

Looking int the dhcp.leases file, I found both mac addresses for hayward-63's interface and mac address from tucker in dhcp records showing this IP address: 10.244.192.169... which I think shows that this is duplicated lease.

Looking further, I see that there are a total of 4 entries and they all say dynamic (even though this IP is originally from the static range -- not sure whether this is by design).

host 2c-59-e5-41-a8-6c {
  dynamic;
  hardware ethernet 2c:59:e5:41:a8:6c;
  fixed-address 10.244.192.169;
}
host 00-22-99-e0-03-37 {
  dynamic;
  hardware ethernet 00:22:99:e0:03:37;
  fixed-address 10.244.192.169;
}
host 00-22-99-e0-04-67 {
  dynamic;
  hardware ethernet 00:22:99:e0:04:67;
  fixed-address 10.244.192.169;
}
host 90-b1-1c-5b-37-e4 {
  dynamic;
  hardware ethernet 90:b1:1c:5b:37:e4;
  fixed-address 10.244.192.169;
}

We've hit the original issue quite a bit and while it's not clear whether it's the duplicate IP causing it, I plan on checking the lease file every time we hit this issue.

I am attaching the maas log and lease files.

Note that hayward-63 was deployed 2 days ago and tucker was deployed 6 days ago and was being used to unboard a new network adapter so its network configuration is different. Even though it's still in the deployed state, there is no dns entry for it:

ubuntu@tucker:~$ ifconfig
ens1 Link encap:Ethernet HWaddr 7c:fe:90:b7:28:10
          inet addr:10.244.166.89 Bcast:10.244.191.255 Mask:255.255.192.0
          inet6 addr: fe80::7efe:90ff:feb7:2810/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:258278 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3498 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:24829812 (24.8 MB) TX bytes:708276 (708.2 KB)

eth0 Link encap:Ethernet HWaddr 2c:59:e5:41:a8:6c
          inet addr:10.244.192.169 Bcast:10.244.255.255 Mask:255.255.192.0
          inet6 addr: fe80::2e59:e5ff:fe41:a86c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:3502316 errors:0 dropped:0 overruns:0 frame:0
          TX packets:128877 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:301940215 (301.9 MB) TX bytes:7423778 (7.4 MB)
          Memory:fbd00000-fbdfffff

eth1 Link encap:Ethernet HWaddr 2c:59:e5:41:a8:6d
          inet6 addr: fe80::2e59:e5ff:fe41:a86d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:3326222 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:218905125 (218.9 MB) TX bytes:648 (648.0 B)
          Memory:fbb00000-fbbfffff
...

ubuntu@maas-integration-september:~$ ping tucker.oilstaging
ping: unknown host tucker.oilstaging

Also note that this bug looks very similar to bug 1562226.

Tags: oil
Revision history for this message
Larry Michel (lmic) wrote :
Larry Michel (lmic)
description: updated
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Larry,

While the node is deployed, can you attach:

1. curtin config for the machine
2. /etc/network/interfaces of the machine.

What I'm guessing is happening:

1. MAAS creates /etc/network/interfaces with static IP addresses
2. Juju overwrites /e/n/i and creates a bridge, which makes it DHCP
3. Bridge gets a different address.

If that's the case, I'd say you need to use container registration feature, or have juju fix the issue.

Changed in maas:
status: New → Incomplete
summary: - Maas assign same IP to multiple nodes including juju bootstrap node
+ Juju replaces /e/n/i with a bridge that DHCP's and causes machine to get
+ a different IP from assigned
Changed in maas:
milestone: none → 1.9.3
Revision history for this message
Larry Michel (lmic) wrote : Re: Juju replaces /e/n/i with a bridge that DHCP's and causes machine to get a different IP from assigned

This is for 2. For 1), curtin config is gone:

auto lo
iface lo inet loopback
    dns-nameservers 10.244.192.10
    dns-search oilstaging

iface eth0 inet manual

auto juju-br0
iface juju-br0 inet static
    gateway 10.244.192.1
    address 10.244.192.169/18
    mtu 1500
    bridge_ports eth0

auto eth1
iface eth1 inet manual
    mtu 1500

auto eth2
iface eth2 inet manual
    mtu 1500

auto eth3
iface eth3 inet manual
    mtu 1500

auto eth4
iface eth4 inet manual
    mtu 1500

auto eth5
iface eth5 inet manual
    mtu 1500

auto eth6
iface eth6 inet manual
    mtu 1500

auto eth7
iface eth7 inet manual
    mtu 1500

Revision history for this message
Mike Pontillo (mpontillo) wrote :

What MAC address is assigned to juju-br0? If it's using a different MAC, that could explain the duplicate leases.

summary: - Juju replaces /e/n/i with a bridge that DHCP's and causes machine to get
- a different IP from assigned
+ Device & deployed machine both have same static IP's
Revision history for this message
Andres Rodriguez (andreserl) wrote : Re: Device & deployed machine both have same static IP's

Some findings:

tucker is a machine in deployed state which has IP 10.244.166.89. However, tucker NIC config is:

eth0 Physical Default fabric untagged Unconfigured (Unconfigured)

eth1 Physical Default fabric untagged OIL-subnet (Unconfigured)

eth2 Physical Default fabric untagged Unconfigured (Unconfigured)

eth3 Physical Default fabric untagged Unconfigured (Unconfigured)

While tucker was deployed, hayward-63 was added as a device, with /static/ IP address (meaning MAAS automatically picked an IP for the device). The IP address automatically selected by MAAS is the /same/ IP address as the one tucker has been configured with.

Revision history for this message
Andres Rodriguez (andreserl) wrote :
Changed in maas:
milestone: 1.9.3 → 1.9.4
summary: - Device & deployed machine both have same static IP's
+ [1.9] Device & deployed machine both have same static IP's
Revision history for this message
Larry Michel (lmic) wrote :

mpontillo was able to logon the system and do some troubleshooting. In that case, a commissioning system got assigned the IP address (from static range) of a deployed system while that system was up. Per his analysis, DHCP and maas were out of sync. He was able to forcefully delete the lease for us to workaround this issue as system would not commission. So this is not longer incomplete and should may be in the triaged state? I'll move back to new.

Changed in maas:
status: Incomplete → New
Revision history for this message
Mike Pontillo (mpontillo) wrote :

What most likely happened was, when MAAS tried to communicate the lease deletion to the DHCP server, either the cluster (rack) controller or the DHCP server was not able to be reached. Therefore MAAS thought the lease was deleted, but DHCP still believed it was a static lease assigned to that MAC.

This is fixed by design in MAAS 2.0. We no longer rely on DHCP to maintain the state of static leases. (as part of the HA changes for MAAS 2.0, MAAS now fully manages the static leases rather than relying on DHCP to maintain that state.)

At this time, we don't have a plan to port the fix to MAAS 1.9, so marking this "Won't Fix" for MAAS 1.9.

Changed in maas:
importance: Undecided → Wishlist
status: New → Won't Fix
Changed in maas:
milestone: 1.9.4 → none
no longer affects: maas/trunk
Changed in maas:
status: Won't Fix → Fix Committed
milestone: none → 2.0.0
importance: Wishlist → Critical
no longer affects: maas/2.0
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.