2016-05-12 23:19:32 |
Larry Michel |
description |
While troubleshooting an issue with juju bootstrap where connection to 17070 are periodically being refused, I noticed that I was sshing into the wrong system when trying to access the bootstrap node. It turns out that the same IP address was assigned to 2 systems.
This actually happened another time where but at the time which had me very confused. Here's the scenario:
1) During deployment, juju deployer lost contact with the bootstrap node, hayward-63 which was assigned 10.244.192.169 address.
2) To preserve, the serve in that state, modified the power parameter to prevent it from being powered off and marked it as broken.
3) Today server was powered off and powered back one through BMC.
4) I then tried to ssh into it (hayward-63) but ended up sshing into another server tucker.
5) I then deleted hayward-63 and added it as a device and selected Static. The device was then assigned the same address 10.244.192.169.
FQDN MAC IP Assignment IP Address Owner
hayward-63.oilstaging 00:22:99:e0:04:67 Static 10.244.192.169 root
Looking int the dhcp.leases file, I find both this mac address and tucker's mac address associated with this IP which I think shows that this is duplicate lease. There are a total of 4 entries and they say dynamic even though this IP is originally from the static range.
host 2c-59-e5-41-a8-6c {
dynamic;
hardware ethernet 2c:59:e5:41:a8:6c;
fixed-address 10.244.192.169;
}
host 00-22-99-e0-03-37 {
dynamic;
hardware ethernet 00:22:99:e0:03:37;
fixed-address 10.244.192.169;
}
host 00-22-99-e0-04-67 {
dynamic;
hardware ethernet 00:22:99:e0:04:67;
fixed-address 10.244.192.169;
}
host 90-b1-1c-5b-37-e4 {
dynamic;
hardware ethernet 90:b1:1c:5b:37:e4;
fixed-address 10.244.192.169;
}
We've hit the original issue quite a bit and while it's not clear whether it's the duplicate IP causing it, I plan on checking the lease file every time we hit this issue.
I am attaching the maas log and lease files.
Note that hayward-63 was deployed 2 days ago and tucker was deployed 6 days ago and was being used to unboard a new network adapter so its network configuration is different. Even though it's still in the deployed state, there is no dns entry for it:
ubuntu@tucker:~$ ifconfig
ens1 Link encap:Ethernet HWaddr 7c:fe:90:b7:28:10
inet addr:10.244.166.89 Bcast:10.244.191.255 Mask:255.255.192.0
inet6 addr: fe80::7efe:90ff:feb7:2810/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:258278 errors:0 dropped:0 overruns:0 frame:0
TX packets:3498 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:24829812 (24.8 MB) TX bytes:708276 (708.2 KB)
eth0 Link encap:Ethernet HWaddr 2c:59:e5:41:a8:6c
inet addr:10.244.192.169 Bcast:10.244.255.255 Mask:255.255.192.0
inet6 addr: fe80::2e59:e5ff:fe41:a86c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3502316 errors:0 dropped:0 overruns:0 frame:0
TX packets:128877 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:301940215 (301.9 MB) TX bytes:7423778 (7.4 MB)
Memory:fbd00000-fbdfffff
eth1 Link encap:Ethernet HWaddr 2c:59:e5:41:a8:6d
inet6 addr: fe80::2e59:e5ff:fe41:a86d/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3326222 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:218905125 (218.9 MB) TX bytes:648 (648.0 B)
Memory:fbb00000-fbbfffff
...
ubuntu@maas-integration-september:~$ ping tucker.oilstaging
ping: unknown host tucker.oilstaging
Also note that this bug looks very similar to bug 1562226. |
While troubleshooting an issue with juju bootstrap where connection to 17070 are periodically being refused, I noticed that I was sshing into the wrong system when trying to access the bootstrap node. It turns out that the same IP address was assigned to 2 systems.
This actually happened another time where I saw myself sshing into the wrong systems, but I was confused as to whether I had the right system in the first.
Here's the scenario for this bug:
1) During deployment, juju deployer lost contact with the bootstrap node, hayward-63 which had been assigned 10.244.192.169 address.
2) To preserve, the server in that state for debugging purposes, modified the power parameter to prevent it from being powered off, and also marked it as broken.
3) Today hayward-63 was powered off and powered back on through BMC.
4) I then tried to ssh into it (hayward-63) but ended up sshing into another server, tucker.
5) I then deleted hayward-63 from maas, and added it as a device after selecting Static so I could boot it from disk and get it back to debugging state. I then noticed that the device was assigned the same address 10.244.192.169, it had been assigned during deployment.
This is the device record from the MAAS UI:
FQDN MAC IP Assignment IP Address Owner
hayward-63.oilstaging 00:22:99:e0:04:67 Static 10.244.192.169 root
Looking int the dhcp.leases file, I found both mac addresses for hayward-63's interface and mac address from tucker in dhcp records showing this IP address: 10.244.192.169... which I think shows that this is duplicated lease.
Looking further, I see that there are a total of 4 entries and they all say dynamic (even though this IP is originally from the static range -- not sure whether this is by design).
host 2c-59-e5-41-a8-6c {
dynamic;
hardware ethernet 2c:59:e5:41:a8:6c;
fixed-address 10.244.192.169;
}
host 00-22-99-e0-03-37 {
dynamic;
hardware ethernet 00:22:99:e0:03:37;
fixed-address 10.244.192.169;
}
host 00-22-99-e0-04-67 {
dynamic;
hardware ethernet 00:22:99:e0:04:67;
fixed-address 10.244.192.169;
}
host 90-b1-1c-5b-37-e4 {
dynamic;
hardware ethernet 90:b1:1c:5b:37:e4;
fixed-address 10.244.192.169;
}
We've hit the original issue quite a bit and while it's not clear whether it's the duplicate IP causing it, I plan on checking the lease file every time we hit this issue.
I am attaching the maas log and lease files.
Note that hayward-63 was deployed 2 days ago and tucker was deployed 6 days ago and was being used to unboard a new network adapter so its network configuration is different. Even though it's still in the deployed state, there is no dns entry for it:
ubuntu@tucker:~$ ifconfig
ens1 Link encap:Ethernet HWaddr 7c:fe:90:b7:28:10
inet addr:10.244.166.89 Bcast:10.244.191.255 Mask:255.255.192.0
inet6 addr: fe80::7efe:90ff:feb7:2810/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:258278 errors:0 dropped:0 overruns:0 frame:0
TX packets:3498 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:24829812 (24.8 MB) TX bytes:708276 (708.2 KB)
eth0 Link encap:Ethernet HWaddr 2c:59:e5:41:a8:6c
inet addr:10.244.192.169 Bcast:10.244.255.255 Mask:255.255.192.0
inet6 addr: fe80::2e59:e5ff:fe41:a86c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3502316 errors:0 dropped:0 overruns:0 frame:0
TX packets:128877 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:301940215 (301.9 MB) TX bytes:7423778 (7.4 MB)
Memory:fbd00000-fbdfffff
eth1 Link encap:Ethernet HWaddr 2c:59:e5:41:a8:6d
inet6 addr: fe80::2e59:e5ff:fe41:a86d/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3326222 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:218905125 (218.9 MB) TX bytes:648 (648.0 B)
Memory:fbb00000-fbbfffff
...
ubuntu@maas-integration-september:~$ ping tucker.oilstaging
ping: unknown host tucker.oilstaging
Also note that this bug looks very similar to bug 1562226. |
|