Bug #1599858 “CI jobs failing with “Message: No valid host was f...” : Bugs : tripleo

Revision history for this message

Derek Higgins (derekh) wrote on 2016-07-07:

#1

Download full text (4.9 KiB)

From postci log, one of the node failed to deploy e.g.
http://logs.openstack.org/34/335434/3/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/085ed2a/logs/postci.txt.gz

The times variuos nodes were created

11:33:54 979f33b0-0df2-4565-b475-25c87296af78
11:34:18 b49a7770-c4db-44db-a822-7b3ee631b58a OK
11:34:28 9a37c72f-eed2-4dfe-a6ef-11d336be716b OK
11:34:36 233d96e9-5ff4-4e16-b730-5cb740d10d51 OK
11:36:17 3bee4016-f6b8-4239-853e-b36d5955667c
11:36:32 37cfeee2-d694-4a39-8421-e84348bd6906
11:36:49 393d0386-c9e1-4549-9917-6803676497a6
11:37:23 d9965837-28fa-4cf9-8d1c-86c656f7810b
11:38:15 6eabef07-f41d-4d81-92ae-59da36e828f0 ERROR

The very first on in this case failed, the nova logs show ironic waiting for some time

nova/nova-compute.log:2016-07-07 11:35:43.132 3716 DEBUG nova.virt.ironic.driver [-] [instance: 979f33b0-0df2-4565-b475-25c87296af78] Still waiting for ironic node 2ebfe30e-df9b-4383-81d1-61b5c3443810 to become ACTIVE: power_state="power off", target_power_state=None, provision_state="deploying", target_provision_state="active" _log_ironic_polling /usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py:118

until eventually
nova/nova-compute.log:2016-07-07 11:35:45.146 3716 ERROR oslo.service.loopingcall InstanceDeployFailure: Failed to provision instance 979f33b0-0df2-4565-b475-25c87296af78: Timeout reached while waiting for callback for node 2ebfe30e-df9b-4383-81d1-61b5c3443810

After this all of the retries (from heat?) fail
grep "start: 4, end: 0" nova/nova-scheduler.log
2016-07-07 11:36:18.069 1929 INFO nova.filters [req-6e2bcd70-d5e0-4ffe-af60-6bbd9a227eee 620cdc7716ad4b199de8315d3e3199b7 57bbfbf0ad5b4d46af40c681bbf66da9 - - -] Filtering removed all hosts for the request with instance ID '3bee4016-f6b8-4239-853e-b36d5955667c'. Filter results: ['RetryFilter: (start: 4, end: 4)', 'TripleOCapabilitiesFilter: (start: 4, end: 4)', 'ComputeCapabilitiesFilter: (start: 4, end: 4)', 'AvailabilityZoneFilter: (start: 4, end: 4)', 'RamFilter: (start: 4, end: 0)']
2016-07-07 11:36:32.717 1929 INFO nova.filters [req-8f78197c-1676-4d41-9004-6514106914e5 620cdc7716ad4b199de8315d3e3199b7 57bbfbf0ad5b4d46af40c681bbf66da9 - - -] Filtering removed all hosts for...

From postci log, one of the node failed to deploy e.g.
http://logs.openstack.org/34/335434/3/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/085ed2a/logs/postci.txt.gz

The times variuos nodes were created

11:33:54 979f33b0-0df2-4565-b475-25c87296af78 
11:34:18 b49a7770-c4db-44db-a822-7b3ee631b58a OK
11:34:28 9a37c72f-eed2-4dfe-a6ef-11d336be716b OK
11:34:36 233d96e9-5ff4-4e16-b730-5cb740d10d51 OK
11:36:17 3bee4016-f6b8-4239-853e-b36d5955667c
11:36:32 37cfeee2-d694-4a39-8421-e84348bd6906
11:36:49 393d0386-c9e1-4549-9917-6803676497a6
11:37:23 d9965837-28fa-4cf9-8d1c-86c656f7810b
11:38:15 6eabef07-f41d-4d81-92ae-59da36e828f0 ERROR

The very first on in this case failed, the nova logs show ironic waiting for some time

nova/nova-compute.log:2016-07-07 11:35:43.132 3716 DEBUG nova.virt.ironic.driver [-] [instance: 979f33b0-0df2-4565-b475-25c87296af78] Still waiting for ironic node 2ebfe30e-df9b-4383-81d1-61b5c3443810 to become ACTIVE: power_state="power off", target_power_state=None, provision_state="deploying", target_provision_state="active" _log_ironic_polling /usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py:118

until eventually
nova/nova-compute.log:2016-07-07 11:35:45.146 3716 ERROR oslo.service.loopingcall InstanceDeployFailure: Failed to provision instance 979f33b0-0df2-4565-b475-25c87296af78: Timeout reached while waiting for callback for node 2ebfe30e-df9b-4383-81d1-61b5c3443810

After this all of the retries (from heat?) fail
grep "start: 4, end: 0" nova/nova-scheduler.log
2016-07-07 11:36:18.069 1929 INFO nova.filters [req-6e2bcd70-d5e0-4ffe-af60-6bbd9a227eee 620cdc7716ad4b199de8315d3e3199b7 57bbfbf0ad5b4d46af40c681bbf66da9 - - -] Filtering removed all hosts for the request with instance ID '3bee4016-f6b8-4239-853e-b36d5955667c'. Filter results: ['RetryFilter: (start: 4, end: 4)', 'TripleOCapabilitiesFilter: (start: 4, end: 4)', 'ComputeCapabilitiesFilter: (start: 4, end: 4)', 'AvailabilityZoneFilter: (start: 4, end: 4)', 'RamFilter: (start: 4, end: 0)']
2016-07-07 11:36:32.717 1929 INFO nova.filters [req-8f78197c-1676-4d41-9004-6514106914e5 620cdc7716ad4b199de8315d3e3199b7 57bbfbf0ad5b4d46af40c681bbf66da9 - - -] Filtering removed all hosts for the request with instance ID '37cfeee2-d694-4a39-8421-e84348bd6906'. Filter results: ['RetryFilter: (start: 4, end: 4)', 'TripleOCapabilitiesFilter: (start: 4, end: 4)', 'ComputeCapabilitiesFilter: (start: 4, end: 4)', 'AvailabilityZoneFilter: (start: 4, end: 4)', 'RamFilter: (start: 4, end: 0)']
2016-07-07 11:36:50.154 1929 INFO nova.filters [req-3f8d4f25-cadf-48fc-8275-5c7225c644c5 620cdc7716ad4b199de8315d3e3199b7 57bbfbf0ad5b4d46af40c681bbf66da9 - - -] Filtering removed all hosts for the request with instance ID '393d0386-c9e1-4549-9917-6803676497a6'. Filter results: ['RetryFilter: (start: 4, end: 4)', 'TripleOCapabilitiesFilter: (start: 4, end: 4)', 'ComputeCapabilitiesFilter: (start: 4, end: 4)', 'AvailabilityZoneFilter: (start: 4, end: 4)', 'RamFilter: (start: 4, end: 0)']
2016-07-07 11:37:26.063 1929 INFO nova.filters [req-080933eb-059c-4bf6-9ff8-b593a779f338 620cdc7716ad4b199de8315d3e3199b7 57bbfbf0ad5b4d46af40c681bbf66da9 - - -] Filtering removed all hosts for the request with instance ID 'd9965837-28fa-4cf9-8d1c-86c656f7810b'. Filter results: ['RetryFilter: (start: 4, end: 4)', 'TripleOCapabilitiesFilter: (start: 4, end: 4)', 'ComputeCapabilitiesFilter: (start: 4, end: 4)', 'AvailabilityZoneFilter: (start: 4, end: 4)', 'RamFilter: (start: 4, end: 0)']
2016-07-07 11:38:16.460 1929 INFO nova.filters [req-a71110a9-b40c-4b20-8809-eeb927d648a5 620cdc7716ad4b199de8315d3e3199b7 57bbfbf0ad5b4d46af40c681bbf66da9 - - -] Filtering removed all hosts for the request with instance ID '6eabef07-f41d-4d81-92ae-59da36e828f0'. Filter results: ['RetryFilter: (start: 4, end: 4)', 'TripleOCapabilitiesFilter: (start: 4, end: 4)', 'ComputeCapabilitiesFilter: (start: 4, end: 4)', 'AvailabilityZoneFilter: (start: 4, end: 4)', 'RamFilter: (start: 4, end: 0)']

Revision history for this message

Derek Higgins (derekh) wrote on 2016-07-07:

#2

I believe whats going on here is that ARP traffic from one CI test is leaking into the tenant provision network of another CI test on the same cloud (rh2).

When the IPA image boots into the node, dhclient get an IP via dhcp,
One check dhcient does before assigning the IP address to the interface is send out an ARP request for the IP to make sure there are no collisions.

In the error case a node on another tenant network is responding to the ARP request and the deploy image doesn't get assigned the IP and just waits until nova/ironic times out.

I can't confirm if this is happening during CI or not, we would need access to the running environments that we don't have but I have come across it happening on an early deployment on rh2, I never got to the bottom of exactly what corner case triggers the problem.

I'm going to submit a patch to tripleo-ci so that each CI job uses a different subnet in its cloud, this should lower the probability of triggering the scenario above. If this improves the problem then it will indicate we are on the right track.

Steven Hardy (shardy) on 2016-07-08

Changed in tripleo:
milestone:	none → ongoing
assignee:	nobody → Derek Higgins (derekh)

Revision history for this message

Emilien Macchi (emilienm) wrote on 2017-07-05:

#3

This bug was last updated over 180 days ago, as tripleo is a fast moving project and we'd like to get the tracker down to currently actionable bugs, this is getting marked as Invalid. If the issue still exists, please feel free to reopen it.

Changed in tripleo:
status:	Confirmed → Invalid

tripleo

CI jobs failing with "Message: No valid host was found. There are not enough hosts available., Code: 500"

Bug Description

Other bug subscribers

Remote bug watches