dynamic_inventory.py generates duplicate IP addresses

Bug #1482375 reported by Nolan Brubaker
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Invalid
High
Unassigned
Kilo
Fix Released
High
Nolan Brubaker
Trunk
Invalid
High
Unassigned

Bug Description

When upgrading from Juno to Kilo, the data structures in the inventory have changed, which can result in duplicated IP addresses among the containers, particularly new ones introduced with Kilo, like the repo servers.

In two separate labs, we've had dynamic_inventory.py create new containers (specifically, repo containers) with duplicate IPs.

From one:

```
# scripts/inventory-manage.py -l --sort ansible_ssh_host |egrep '9bc3eb67|4b049b34'
| heat-node1_cinder_api_container-9bc3eb67 | False | cinder_api | heat-node1 | None | 172.29.238.156 | None |
| heat-node2_repo_container-4b049b34 | None | pkg_repo | heat-node2 | None | 172.29.238.156 | None |

# cat /etc/hosts|egrep '9bc3eb67|4b049b34'
172.29.238.156 heat-node2_repo_container-4b049b34
172.29.238.156 heat-node1_cinder_api_container-9bc3eb67
```

From another lab:

```
root@578126-infra01:/opt/rpc-openstack/os-ansible-deployment# scripts/inventory-manage.py --list | awk '/172\./ { print $12 }' | sort | uniq -c | sort -n | tail
      1 172.24.243.25
      1 172.24.243.250
      1 172.24.243.26
      1 172.24.243.28
      1 172.24.243.40
      1 172.24.243.45
      1 172.24.243.67
      1 172.24.243.79
      1 172.24.243.88
      2 172.24.243.230

root@578126-infra01:/opt/rpc-openstack/os-ansible-deployment# scripts/inventory-manage.py --list | grep 172.24.243.230
| 578126-infra01_repo_container-56ab6b39 | None | pkg_repo | 578126-infra01 | None | 172.24.243.230 | None |
| 578128-infra03_rabbit_mq_container-4c96d7b0 | False | rabbit | 578128-infra03 | None | 172.24.243.230 | None |
```

Here's a fresh greenfield container:
```
578126-infra01# ./dynamic_inventory.py --list --file ./etc/openstack_deploy/
> /opt/rpc-openstack/os-ansible-deployment/playbooks/inventory/dynamic_inventory.py(769)_set_used_ips()
-> for host_entry in inventory['_meta']['hostvars'].values():
(Pdb) pprint(host_entry)
{u'ansible_ssh_host': u'172.24.240.139',
 u'component': u'cinder_scheduler',
 u'container_address': u'172.24.240.139',
 u'container_name': u'578128-infra03_cinder_scheduler_container-8efb1da3',
 u'container_networks': {u'container_address': {u'address': u'172.24.240.139',
                                                u'bridge': u'br-mgmt',
                                                u'interface': u'eth1',
                                                u'netmask': u'255.255.252.0'}},
 u'physical_host': u'578128-infra03',
 u'physical_host_group': u'infra_hosts',
 u'properties': {u'container_release': u'trusty', u'service_name': u'cinder'}}
```

Here is a Juno neutron container in that same loop, using the Kilo dynamic_inventory.py script:
```
578126-infra01# ./dynamic_inventory.py --list --file ./etc/openstack_deploy/
> /opt/rpc-openstack/os-ansible-deployment/playbooks/inventory/dynamic_inventory.py(769)_set_used_ips()
-> for host_entry in inventory['_meta']['hostvars'].values():
(Pdb) pprint(host_entry)
{u'ansible_ssh_host': u'172.24.242.2',
 u'component': u'neutron_agent',
 u'container_address': u'172.24.242.2',
 u'container_name': u'578126-infra01_neutron_agents_container-a798edd6',
 u'container_netmask': u'255.255.252.0',
 u'container_network': {u'container_bridge': u'br-mgmt',
                        u'container_interface': u'eth1',
                        u'container_netmask': u'255.255.252.0'},
 u'is_metal': False,
 u'physical_host': u'578126-infra01',
 u'tunnel_address': u'172.24.238.35',
 u'tunnel_netmask': u'255.255.252.0'}
```

Finally, how to reproduce locally, with the following files placed in an `etc/openstack_deploy/` directory outside of `/etc/`:

From *Kilo*:

* /etc/openstack_deploy/env.d/*
* /etc/openstack_deploy/openstack_environment.yml
* /etc/openstack_deploy/openstack_user_config.yml
* /etc/openstack_deploy/conf.d/

From *Juno*:

* /etc/rpc_deploy/rpc_inventory.json copied to /etc/openstack_deploy/openstack_inventory.json.
** This can be found in the /etc/rpc_deploy.OLD directory

You can then apply the following patch, which should exit on the first container where an IP that's in use isn't in the USED_IPS list:

```
diff --git a/playbooks/inventory/dynamic_inventory.py b/playbooks/inventory/dynamic_inventory.py
index b52932a..5ffe2b8 100755
--- a/playbooks/inventory/dynamic_inventory.py
+++ b/playbooks/inventory/dynamic_inventory.py
@@ -22,6 +22,7 @@ import netaddr
 import os
 import Queue
 import random
+import sys
 import tarfile
 import uuid
 import yaml
@@ -755,7 +756,6 @@ def _set_used_ips(user_defined_config, inventory):
                 USED_IPS.extend([str(i) for i in ip_range])
             else:
                 append_if(array=USED_IPS, item=split_ip[0])
-
     # Find all used IP addresses and ensure that they are not used again
     for host_entry in inventory['_meta']['hostvars'].values():
         networks = host_entry.get('container_networks', dict())
@@ -763,6 +763,16 @@ def _set_used_ips(user_defined_config, inventory):
             address = network_entry.get('address')
             if address:
                 append_if(array=USED_IPS, item=address)
+ # Check if container addresses are missing from used list
+ for host_entry in inventory['_meta']['hostvars'].values():
+ #if host_entry['container_address'] not in USED_IPS:
+ # import pdb; pdb.set_trace()
+ # print("%s's container address was not in USED_IPS" % (host_entry['container_name'],))
+ for net_type in ('container', 'tunnel', 'storage'):
+ if host_entry.get('%s_address' % net_type) not in USED_IPS:
+ print("%s_address for %s was not found in USED_IPS"
+ % (net_type, host_entry['container_name'])
+ sys.exit(255)
```

You can then run `playbooks/inventory/dynamic_inventory.py --list --file etc/openstack_deploy`.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-ansible-deployment (kilo)

Fix proposed to branch: kilo
Review: https://review.openstack.org/210152

Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

The bug is currently being fixed in kilo, and may need to be forward ported to master.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on os-ansible-deployment (kilo)

Change abandoned by Nolan Brubaker (<email address hidden>) on branch: kilo
Review: https://review.openstack.org/210152
Reason: This has been abandoned in favor of https://review.openstack.org/#/c/210164/. The dynamic inventory script already correctly restructures data between releases, but it does have a window of error where data is not yet structured for Kilo, but the used IP list doesn't look for the Juno information, either.

Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

I am marking Trunk as Won't Fix/Wishlist because this should *not* be forward ported to Liberty.

Liberty should NOT have support for Juno upgrades.

Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

The include tar file should be extracted at the root of an os-ansible-deployment Kilo checkout.

The exercise.sh script will repeatedly run the dynamic_inventory.py script on a Juno inventory file moved into place, and verify that it does not have any IP addresses listed more than once.

This was used to test https://review.openstack.org/#/c/210164/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (kilo)

Reviewed: https://review.openstack.org/210164
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=4932bdfa2f110de7460e5a32916afb796173260b
Submitter: Jenkins
Branch: kilo

commit 4932bdfa2f110de7460e5a32916afb796173260b
Author: Nolan Brubaker <email address hidden>
Date: Thu Aug 6 20:34:48 2015 -0400

    Store used IPs from Juno config during upgrade

    Kilo introduced a new structure to the networks for containers, and most
    of the dynamic inventory script accounts for the differences between
    Juno and Kilo. However, populating the used IP list happens prior to
    restructuring the data on the first execution of Kilo code, which means
    that the used IP processing is not looking for the right keys in the
    host variables.

    This results in non-deterministic IP address clashes, particularly with
    repo containers, during upgrades. The IP addresses for a given network
    are pulled from the pool of available IPs randomly.

    Also, this change should only apply to Kilo; Liberty should not know
    about Juno inventory details, since upgrades should only pass through
    one version at a time.

    Change-Id: Iae47e27cc7e926afe860e4896bf9f1ab291bc7c8
    Closes-Bug: #1482375

no longer affects: openstack-ansible/trunk
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/openstack-ansible 11.2.11

This issue was fixed in the openstack/openstack-ansible 11.2.11 release.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/openstack-ansible 11.2.12

This issue was fixed in the openstack/openstack-ansible 11.2.12 release.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/openstack-ansible 11.2.14

This issue was fixed in the openstack/openstack-ansible 11.2.14 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.