Allow metadata agent to make calls to more than one nova_metadata_ip

Bug #1620279 reported by Slawek Kaplonski
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Won't Fix
Undecided
Unassigned

Bug Description

Currently in config of metadata agent there is option to set IP address of nova metadata service (nova_metadata_ip).
There can be situation that there is more than one nova-api service in cluster and in such case if configured nova metadata IP will return e.g. error 500 then it will be returned to instance, but there can be situation that all other nova-api services are working fine and call to other Nova service would return proper metadata.

So proposition is to change nova_metadata_ip string option to list of IP addresses and to change metadata agent that it will try to make calls to one of configured Nova services. If response from this Nova service will not be 200, than agent will try to make call to next Nova service. If response from all Nova services will fail, then it will return lowest error code which will get from Nova (for example Nova-api-1 returned 500 and Nova-api-2 returned 404 - agent will return to VM response 404).

Tags: rfe
Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

This makes sense, if we also round-robin a bit, we also enhance scalability.

Changed in neutron:
assignee: nobody → Slawek Kaplonski (slaweq)
Revision history for this message
Jakub Libosvar (libosvar) wrote :

@Miguel: I think such kind of scalability should be handled on nova-api level by introducing load-balancers or multiple handlers, not on client side. :)

I'll just add here some thoughts:
 - typically when there are multiple nova apis, they are hidden behind VIP
 - LB using round-robin will try different nova-apis
 - guests re-try calling to metadata api if the get unsuccessful response multipletimes
 - IIUC the aim of this RFE is to hide errors from client - which is in this case guest instance

Based on above, would introducing a retry mechanism in metadata agent have the same effect while we won't need to change any configuration in our precious installers?

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

A retry + loadbalancer in front of the nova api would achieve exactly the same,
but still we need to add an/some option(s) (retry count?, incremental step backs, etc...) which
of course, could be defaulted :)

John Schwarz (jschwarz)
Changed in neutron:
importance: Undecided → Wishlist
status: New → Confirmed
John Schwarz (jschwarz)
Changed in neutron:
status: Confirmed → New
importance: Wishlist → Undecided
Revision history for this message
Gary Kotton (garyk) wrote :

I do not think that this is a bug. The nova_api IP configured in can be an IP address of a VIP of nova-api's.
We do not need to invent the wheel here

Changed in neutron:
status: New → Won't Fix
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

@Gary: I know that there is e.g. haproxy and can make round robin between several nova-api services but imagine case that one nova-api returns error 500 for some reason (not always, but only for specified requests). In such case end-user will see errors 500 in cloud-init logs for example during booting vm. Such errors could be hidden for end user in instance logs.
For private clouds and internal users it's maybe not big problem but for public cloud providers it is problem because it's not good if first think which customer can see is error 500 (even if this error has got no real impact for his service).

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/370727

Changed in neutron:
status: Won't Fix → In Progress
Revision history for this message
Brian Haley (brian-haley) wrote :

I would agree with Gary.

If a deployment is already large enough to have multiple nova-apis behind a VIP for the all the public Openstack API endpoints, they are going to just add one for port 8775 as well. And if one is generating 500 errors the proxy will take it out of rotation. That's what we did in our public cloud.

Changed in neutron:
status: In Progress → Won't Fix
assignee: Slawek Kaplonski (slaweq) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.openstack.org/370727
Reason: as there are -2 I think it is not worth to continue it

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.