EndpointNotFound raised by Pike n-cpu when running alongside Queens n-api

Bug #1775075 reported by Lee Yarwood
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
Queens
Fix Released
High
Lee Yarwood

Bug Description

Description
===========
During a P to Q upgrade n-cpu processes still running P will be unable to find the volumev2 endpoint when running alongside Q n-api processes due to the following change:

Update cinder in RequestContext service catalog
https://review.openstack.org/#/c/510947/

This results in failures anytime the P n-cpu process attempts to interact with the volume service, for example during LM from the node:

2018-06-02 00:19:17.683 1 WARNING nova.virt.libvirt.driver [req-3712be3d-b883-4fe1-bab0-83ee44bd5bb5 e16a043a84b14e2b8afbdd1b8677259f cb92ed750eac463faf8935cb137f1e60 - default default] [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] Error monitoring migration: internalURL endpoint for volumev2 service named cinderv2 not found: EndpointNotFound: internalURL endpoint for volumev2 service named cinderv2 not found
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] Traceback (most recent call last):
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6817, in _live_migration
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] finish_event, disk_paths)
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6728, in _live_migration_monitor
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] migrate_data)
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 76, in wrapped
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] function_name, call_dict, binary)
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] self.force_reraise()
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] six.reraise(self.type_, self.value, self.tb)
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 67, in wrapped
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] return f(self, context, *args, **kw)
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 218, in decorated_function
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] kwargs['instance'], e, sys.exc_info())
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] self.force_reraise()
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] six.reraise(self.type_, self.value, self.tb)
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 206, in decorated_function
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] return function(self, context, *args, **kwargs)
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5684, in _post_live_migration
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] migrate_data)
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7169, in post_live_migration
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] connector)
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 235, in wrapper
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] res = method(self, ctx, *args, **kwargs)
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 257, in wrapper
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] res = method(self, ctx, volume_id, *args, **kwargs)
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 410, in initialize_connection
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] context).volumes.initialize_connection(volume_id, connector)
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 116, in cinderclient
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] url = _SESSION.get_endpoint(auth, **service_parameters)
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/keystoneauth1/session.py", line 947, in get_endpoint
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] return auth.get_endpoint(self, **kwargs)
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/nova/context.py", line 78, in get_endpoint
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] region_name=region_name)
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/positional/__init__.py", line 101, in inner
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] return wrapped(*args, **kwargs)
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/keystoneauth1/access/service_catalog.py", line 344, in url_for
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] endpoint_id=endpoint_id).url
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/positional/__init__.py", line 101, in inner
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] return wrapped(*args, **kwargs)
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] File "/usr/lib/python2.7/site-packages/keystoneauth1/access/service_catalog.py", line 407, in endpoint_data_for
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] raise exceptions.EndpointNotFound(msg)
2018-06-02 00:19:17.683 1 ERROR nova.virt.libvirt.driver [instance: 1a3800b3-9f75-4999-b726-a1afb0ebdd9b] EndpointNotFound: internalURL endpoint for volumev2 service named cinderv2 not found

Steps to reproduce
==================
* Starting with a P environment
* Upgrade n-api/n-schd/n-cond etc to Q
* Attempt to attach a volume to an instance on a P n-cpu host

Expected result
===============
volumev2 endpoint found and used.

Actual result
=============
EndpointNotFound

Environment
===========
1. Exact version of OpenStack you are running. See the following
  list for all releases: http://docs.openstack.org/releases/

   Queens control plane, Pike computes.

2. Which hypervisor did you use?
   (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
   What's the version of that?

   Libvirt + KVM

2. Which storage type did you use?
   (For example: Ceph, LVM, GPFS, ...)
   What's the version of that?

   N/A

3. Which networking type did you use?
   (For example: nova-network, Neutron with OpenVSwitch, ...)

   N/A

Logs & Configs
==============
LM between OSP 12 and OSP 13 computes fail during major upgrade
https://bugzilla.redhat.com/show_bug.cgi?id=1585656

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/572213

Matt Riedemann (mriedem)
tags: added: upgrades volumes
Changed in nova:
assignee: nobody → Lee Yarwood (lyarwood)
status: New → In Progress
status: In Progress → New
assignee: Lee Yarwood (lyarwood) → nobody
Revision history for this message
Lee Yarwood (lyarwood) wrote :

The original report did not highlight that this requires the P computes to be using a modified [cinder]catalog_info specifically looking for a cinderv2 type endpoint, as is default in TripleO:

https://github.com/openstack/tripleo-heat-templates/blob/351c320c191252c496f6911f972d31390469585b/puppet/services/nova-base.yaml#L226

You could argue that this is a config issue but IMHO the fact this is possible in P means we should continue to provide the type in the request context service catalog in Q, allowing any P computes to LM instances and eventually upgrade to Q where the default has now changed in TripleO:

https://github.com/openstack/tripleo-heat-templates/blob/3570a94a69e05af23cee11fbfaf3822538f75023/puppet/services/nova-base.yaml#L228

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/572213
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c7d87f6691bbc39daca918729b769cc2d2000abd
Submitter: Zuul
Branch: stable/queens

commit c7d87f6691bbc39daca918729b769cc2d2000abd
Author: Lee Yarwood <email address hidden>
Date: Mon Jun 4 22:12:18 2018 +0100

    Allow cinderv2 endpoints within the request context catalog

    This partially reverts commit a03976ee892333720b2227f607a5ddbb77163fea.

    During a P to Q upgrade it is common to have the control services
    upgraded to Q ahead of the computes. During this time the Q n-api
    services will provide request contexts over the wire to these P computes
    where the service catalog held within does not contain any cinderv2 type
    endpoints, as cinderv2 support has been removed from Nova in Q.

    This becomes a problem if the P computes are specifically looking for a
    cinderv2 type endpoint via [cinder]catalog_info within nova.conf. For
    example with TripleO deployed Pike environments this has a default value
    of 'volumev2:cinderv2:internalURL'.

    This change ensures cinderv2 type endpoints are still provided by Queens
    n-api control services to Pike computes during an upgrade. This is only
    required on stable/queens as we only support N-1 running computes during
    an upgrade and so we don't need to land anything in R to support P
    computes.

    Closes-bug: #1775075
    Change-Id: I45299df2bf095c12bfce5b1ac3e5460a11dd0131

Matt Riedemann (mriedem)
Changed in nova:
status: New → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.6

This issue was fixed in the openstack/nova 17.0.6 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.