Nova Placement API on IPv6 unreachable from compute nodes

Bug #1663187 reported by Gabriele Cerami
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Juan Antonio Osorio Robles
tripleo
Fix Released
Critical
Juan Antonio Osorio Robles

Bug Description

logs at

http://logs.openstack.org/periodic/periodic-tripleo-ci-centos-7-ovb-updates/024c997/console.html#_2017-02-09_08_41_30_014769

show the error
"No valid host was found"

The deployment completed and updated successfully

Tags: ci
description: updated
Revision history for this message
Gabriele Cerami (gcerami) wrote :
Download full text (4.9 KiB)

overcloud nova conductor logs at

http://logs.openstack.org/periodic/periodic-tripleo-ci-centos-7-ovb-updates/024c997/logs/overcloud-controller-0/var/log/nova/nova-conductor.txt.gz#_2017-02-09_08_39_51_132

shows the full error, but I don't see any root cause
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager [req-ef5efb8e-c34a-4a61-85e5-823333b66161 - - - - -] Failed to schedule instances
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager Traceback (most recent call last):
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 866, in schedule_and_build_instances
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager request_specs[0].to_legacy_filter_properties_dict())
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 597, in _schedule_instances
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager hosts = self.scheduler_client.select_destinations(context, spec_obj)
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/utils.py", line 371, in wrapped
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager return func(*args, **kwargs)
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 51, in select_destinations
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager return self.queryclient.select_destinations(context, spec_obj)
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, in __run_method
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager return getattr(self.instance, __name)(*args, **kwargs)
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/query.py", line 32, in select_destinations
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager return self.scheduler_rpcapi.select_destinations(context, spec_obj)
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/rpcapi.py", line 129, in select_destinations
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager return cctxt.call(ctxt, 'select_destinations', **msg_args)
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 169, in call
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager retry=self.retry)
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 97, in _send
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager timeout=timeout, retry=retry)
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 458, in send
2017-02-09 08:39:51.132 243236 ERROR nova.conductor.manager retry=retry)
2017-02-0...

Read more...

Revision history for this message
Emilien Macchi (emilienm) wrote :

I investigated this one and it happens for all ovb-updates job. I'm almost sure it's related to IPv6.

Nova Placement API that runs on the controller in WSGi with Apache is binded on IPv6 interface, but somehow the nova-compute process fails to reach the service:
http://logs.openstack.org/95/414395/5/check-tripleo/gate-tripleo-ci-centos-7-ovb-updates/0e8a0f5/logs/overcloud-novacompute-0/var/log/nova/nova-compute.txt.gz#_2017-02-11_19_16_11_161

"Placement API service is not responding".
This error is related to networking and this is not about Keystone authorization or anything else.

We need to investigate if Compute node can reach this service by using the public endpoint (ipv6), otherwise we might need to run this service on ipv4 by using internal endpoint. I'm testing that on https://review.openstack.org/#/c/432761/

Changed in tripleo:
importance: High → Critical
assignee: nobody → Emilien Macchi (emilienm)
milestone: ongoing → ocata-rc1
assignee: Emilien Macchi (emilienm) → nobody
tags: added: alert
summary: - CI: periodic master updates fails to spawn pingtest instance
+ Nova Placement API on IPv6 unreachable from compute nodes
Revision history for this message
Emilien Macchi (emilienm) wrote :

I confirm https://review.openstack.org/#/c/432761/ helps to fix Ipv6 deployments. Now I'm not sure if whether or not I took the right approach. Help is welcome!

Changed in tripleo:
assignee: nobody → Emilien Macchi (emilienm)
status: Triaged → In Progress
Revision history for this message
Juan Antonio Osorio Robles (juan-osorio-robles) wrote :

Seems to me that this is also a nova issue, as the interface to be used should be configurable.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/432855

Changed in nova:
assignee: nobody → Juan Antonio Osorio Robles (juan-osorio-robles)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Juan Antonio Osorio Robles (<email address hidden>) on branch: master
Review: https://review.openstack.org/432855

Revision history for this message
Juan Antonio Osorio Robles (juan-osorio-robles) wrote :

Somebody had uploaded a fix before me: https://review.openstack.org/#/c/426163/6

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/432864

Changed in tripleo:
assignee: Emilien Macchi (emilienm) → Juan Antonio Osorio Robles (juan-osorio-robles)
Revision history for this message
Chris Dent (cdent) wrote :

There is presumably something more weird going on than the endpoint simply not being reachable, let's try a different one. Because the placement API only exposes itself as a wsgi application, any problems with reaching it are either with how it has been listed in keystone or how the wsgi server (in this case apache with mod wsgi) has been configured, both of which are entirely in the domain (in this case) of the puppet files. So something is weird with how the network is setup, and working around that seems like it is just hiding a bug or at least a misunderstanding of how the network is configured. Is the goal that the n-cpu and nova-scheduler hosting boxes can reach the IPV6 network?

In any case, this patch which you may already know about may be helpful: https://review.openstack.org/#/c/426163/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.openstack.org/432761
Reason: https://review.openstack.org/#/c/432864/

Changed in nova:
status: In Progress → Invalid
Revision history for this message
Ben Nemec (bnemec) wrote :

I suspect the reason this breaks on our ipv6 jobs is that our compute nodes are not directly connected to the public network. I think we get away with it on ipv4 because we have forwarding rules set up on the undercloud that indirectly allows the compute nodes access to the public api endpoints, but I suspect those rules don't allow ipv4 to ipv6 forwarding. Also, that forwarding is not a requirement and it's not something we should rely on because it sends all of that traffic over the provisioning network.

So I think on the tripleo side we just need to make use of the new nova option and move placement to the internal network so the compute nodes have direct access.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/432864
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=ca843e18824f8ff5bfe2f576ad2afb894a16d2f4
Submitter: Jenkins
Branch: master

commit ca843e18824f8ff5bfe2f576ad2afb894a16d2f4
Author: Juan Antonio Osorio Robles <email address hidden>
Date: Mon Feb 13 08:20:16 2017 +0200

    Configure the placement API's interface to use the internal endpoint

    Due to the keystoneauth library's defaults, it uses the public interface
    currently. This is not desirable in most cases (specially when using
    network isolation); so we set it to use the internal one.

    Change-Id: Ic222a2b734f4d512349fd8556aa2864b13a1eb07
    Depends-On: I1c7fd3a32d04e2fafb3820d1c1f221f45c613c83
    Closes-Bug: #1663187

Changed in tripleo:
status: In Progress → Fix Released
tags: removed: alert
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 6.0.0.0rc1

This issue was fixed in the openstack/tripleo-heat-templates 6.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers