the node IP as reported by 'fuel nodes' command differs from the real one

Bug #1455473 reported by Alexei Sheplyakov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Triaged
Low
Fuel Sustaining
Mitaka
Won't Fix
Low
Fuel Python (Deprecated)
Newton
Triaged
Low
Fuel Sustaining

Bug Description

fuel nodes

id | status | name | cluster | ip | mac | roles | pending_roles | online | group_id
---|----------|----------------------------|---------|------------|-------------------|-------|---------------|--------|---------
13 | discover | compute (bay-15) (c2:d4) | 7 | 10.20.0.5 | 38:ea:a7:35:c2:d4 | | compute | True | 7
14 | discover | compute (bay-14) (a0:84) | 7 | 10.20.0.4 | 38:ea:a7:90:a0:84 | | compute | True | 7
15 | discover | controller (bay-3) (c0:80) | 7 | 10.20.0.9 | 38:ea:a7:35:c0:80 | | controller | True | 7
16 | discover | ceph (bay-6) (c2:4c) | 7 | 10.20.0.6 | 38:ea:a7:35:c2:4c | | ceph-osd | True | 7
17 | discover | controller (bay-5) (32:b4) | 7 | 10.20.0.12 | 38:ea:a7:11:32:b4 | | controller | True | 7
18 | discover | mongo (bay-11) (c3:78) | 7 | 10.20.0.7 | 38:ea:a7:35:c3:78 | | mongo | True | 7
19 | discover | controller (bay-4) (bf:d0) | 7 | 10.20.0.3 | 38:ea:a7:35:bf:d0 | | controller | True | 7
20 | discover | ceph (bay-8) (2d:84) | 7 | 10.20.0.12 | a6:54:1a:94:86:42 | | ceph-osd | False | 7
21 | discover | zabbix (bay-16) (c3:e0) | 7 | 10.20.0.8 | 38:ea:a7:35:c3:e0 | | zabbix-server | True | 7
22 | discover | ceph (bay-7) (2a:ac) | 7 | 10.20.0.11 | 38:ea:a7:11:2a:ac | | ceph-osd | True | 7
23 | discover | mongo (bay-12) (2a:3c) | 7 | 10.20.0.14 | 38:ea:a7:11:2a:3c | | mongo | True | 7
24 | discover | mongo (bay-13) (32:50) | 7 | 10.20.0.13 | 38:ea:a7:11:32:50 | | mongo | True | 7

(http://paste.openstack.org/show/189347)

Note that the IP of the node 17 and 20 are the same.
However DHCP server (dnsmasq) correctly assigns IP addresses so they are unique:

[root@f075ca226654 ~]# cat /var/lib/dnsmasq/dnsmasq.leases
1425578387 38:ea:a7:11:2a:3c 10.20.0.14 * 00:37:33:35:31:35:31:36:43:55:34:34:39:34:35:42:50
1425578382 38:ea:a7:11:32:50 10.20.0.13 * 00:37:33:35:31:35:31:36:43:55:34:34:39:34:35:42:34
1425578377 38:ea:a7:11:32:b4 10.20.0.12 * 00:37:33:35:31:35:31:36:43:55:34:34:39:34:35:42:41
1425578375 38:ea:a7:11:2a:ac 10.20.0.11 * *
1425578371 38:ea:a7:35:c0:80 10.20.0.9 * 00:37:33:35:31:35:31:36:43:55:34:32:37:46:46:33:37
1425578369 38:ea:a7:35:c3:e0 10.20.0.8 * 00:37:33:35:31:35:31:36:43:55:34:32:37:46:46:33:4d
1425578367 38:ea:a7:35:c3:78 10.20.0.7 * 00:37:33:35:31:35:31:36:43:55:34:32:37:46:46:33:50
1425578371 38:ea:a7:35:c2:4c 10.20.0.6 * 00:37:33:35:31:35:31:36:43:55:34:32:37:46:46:33:43
1425578369 38:ea:a7:35:c2:d4 10.20.0.5 * 00:37:33:35:31:35:31:36:43:55:34:32:37:46:46:33:35
1425578370 38:ea:a7:90:a0:84 10.20.0.4 * 00:37:33:35:31:35:31:36:43:55:34:32:37:46:46:33:33
1425578368 38:ea:a7:35:bf:d0 10.20.0.3 * 00:37:33:35:31:35:31:36:43:55:34:32:37:46:46:33:46

(http://paste.openstack.org/show/189328)

Steps to reproduce:

1. Make sure the dhcp-sequential-ip option is switched off in dnsmasq.conf
2. Boot ~ 20 -- 40 nodes (VMs are OK too) simultaneously, wait until they get discovered.
3. Ask nailgun about known nodes: run
    fuel nodes
   on the master node
4. Compare the listed IP addresses with the real ones (ip neigh show or look into dnsmasq leases file)

Preliminary analysis:

dnsmasq can DHCPOFFER the same address to different clients. This behavior is explicitly permitted by RFC 2131 [1].
It's not a problem since the address is not assigned to the client until the server has ACK'ed it.
Sometimes nailgun agent reports the IP from DHCPOFFER even if that IP gets NACKed by the server
(and the DHCP client running on the node in question correctly obtains a new IP).

description: updated
Dmitry Pyzhov (dpyzhov)
tags: added: module-nailgun-agent
Changed in fuel:
assignee: nobody → Fuel Python Team (fuel-python)
status: New → Confirmed
description: updated
Revision history for this message
Łukasz Oleś (loles) wrote :

I don't see a6:54:1a:94:86:42 in /var/lib/dnsmasq/dnsmasq.leases
and nailgun says that it's offline. Can you check it?

Revision history for this message
Mike Scherbakov (mihgen) wrote :

Are you guys sure it's medium? Do we need any known-issue described how user can fix this?

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

There's a work around (using --dhcp-sequential-ip dnsmasq option) which make this bug quite difficult to reproduce, hence the Medium importance.

Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

We have workaround, so moving it to 8.0

Changed in fuel:
status: Confirmed → Won't Fix
Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/8.0.x
Changed in fuel:
status: Won't Fix → Confirmed
milestone: 7.0 → 8.0
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

Nailgun agent needs to understand when IP is not ACK'ed and not report it.

tags: added: feature
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> Nailgun agent needs to understand when IP is not ACK'ed and not report it.

Possible solutions: 1) restart nailgun agent from DHCP hook script, 2) track the interfaces' IP address changes (netlink) events

Changed in fuel:
status: Confirmed → Triaged
importance: Medium → Low
Dmitry Pyzhov (dpyzhov)
tags: added: area-python
Revision history for this message
Alexander Kislitsky (akislitsky) wrote :

We passed SCF in 8.0. Moving the bug to 9.0.

Changed in fuel:
milestone: 8.0 → 9.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.