Nodes periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master fail to get usable IPs though os-net-config with Error, some other host (BE:E5:4F:B9:21:B0) already uses address

Bug #1818060 reported by Gabriele Cerami on 2019-02-28
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Critical
Gabriele Cerami

Bug Description

logs at

https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master/0398d7f/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz#_2019-02-28_01_33_47

show node are failing to get a usable IP address.
First this error shows

2019-02-28 01:33:47 | [2019/02/28 01:31:01 AM] [INFO] running ifup on interface: eth1
2019-02-28 01:33:47 | [2019/02/28 01:31:01 AM] [ERROR] Failure(s) occurred when applying configuration
2019-02-28 01:33:47 | [2019/02/28 01:31:01 AM] [ERROR] stdout: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Error, some other host (BE:E5:4F:B9:21:B0) already uses address 172.18.0.79.
2019-02-28 01:33:47 | , stderr:
2019-02-28 01:33:47 | Traceback (most recent call last):
2019-02-28 01:33:47 | File "/bin/os-net-config", line 10, in <module>
2019-02-28 01:33:47 | sys.exit(main())
2019-02-28 01:33:47 | File "/usr/lib/python2.7/site-packages/os_net_config/cli.py", line 295, in main
2019-02-28 01:33:47 | activate=not opts.no_activate)
2019-02-28 01:33:47 | File "/usr/lib/python2.7/site-packages/os_net_config/impl_ifcfg.py", line 1696, in apply
2019-02-28 01:33:47 | raise os_net_config.ConfigurationError(message)
2019-02-28 01:33:47 | os_net_config.ConfigurationError: Failure(s) occurred when applying configuration

Than, maybe as consequence:

2019-02-28 01:33:47 | [2019/02/28 01:33:41 AM] [ERROR] Failure(s) occurred when applying configuration
2019-02-28 01:33:47 | [2019/02/28 01:33:41 AM] [ERROR] stdout:
2019-02-28 01:33:47 | Determining IP information for eth5... failed.
2019-02-28 01:33:47 | , stderr:
2019-02-28 01:33:47 | [2019/02/28 01:33:41 AM] [ERROR] stdout:
2019-02-28 01:33:47 | Determining IP information for eth4... failed.
2019-02-28 01:33:47 | , stderr:
2019-02-28 01:33:47 | [2019/02/28 01:33:41 AM] [ERROR] stdout:
2019-02-28 01:33:47 | Determining IP information for eth3... failed.
2019-02-28 01:33:47 | , stderr:
2019-02-28 01:33:47 | [2019/02/28 01:33:41 AM] [ERROR] stdout:
2019-02-28 01:33:47 | Determining IP information for eth2... failed.
2019-02-28 01:33:47 | , stderr:
2019-02-28 01:33:47 | [2019/02/28 01:33:41 AM] [ERROR] stdout:
2019-02-28 01:33:47 | Determining IP information for eth1... failed.
2019-02-28 01:33:47 | , stderr:
2019-02-28 01:33:47 | Traceback (most recent call last):
2019-02-28 01:33:47 | File "/bin/os-net-config", line 10, in <module>
2019-02-28 01:33:47 | sys.exit(main())
2019-02-28 01:33:47 | File "/usr/lib/python2.7/site-packages/os_net_config/cli.py", line 295, in main
2019-02-28 01:33:47 | activate=not opts.no_activate)
2019-02-28 01:33:47 | File "/usr/lib/python2.7/site-packages/os_net_config/impl_ifcfg.py", line 1696, in apply
2019-02-28 01:33:47 | raise os_net_config.ConfigurationError(message)
2019-02-28 01:33:47 | os_net_config.ConfigurationError: Failure(s) occurred when applying configuration

wes hayutin (weshayutin) wrote :

 openstack server list | grep -i error | wc -l 
184

Gabriele Cerami (gcerami) wrote :

Proposing https://review.rdoproject.org/r/19048 the include error servers cleanup in the periodic cleanup

wes hayutin (weshayutin) on 2019-03-06
Changed in tripleo:
status: Triaged → Fix Released
Ronelle Landy (rlandy) wrote :

Reopening this bug. We hit the port failures when the stack list is clean.
Possibly while one stack is created while another is being deleted.
Reopening to monitor this.

example: https://logs.rdoproject.org/08/653408/1/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/5c9b021/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz#_2019-04-17_13_07_08

Changed in tripleo:
status: Fix Released → In Progress
wes hayutin (weshayutin) wrote :

dhcp should NOT distribute an ip that is already allocated. AFAICT the heat stacks are deleted as much as possible during a CI run [1] , additionally clean up scripts are running in the background.

[1] https://github.com/rdo-infra/review.rdoproject.org-config/blob/master/roles/ovb-manage/tasks/ovb-delete-stack.yml#L33-L71

IMHO we have a bug in neutron dhcp

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers