[RFE] Add distributed datapath for metadata

Bug #1933222 reported by LIU Yulong
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
In Progress
Wishlist
Unassigned

Bug Description

When instances are booting, they will try to retrieve metadata from
Nova by the path of Neutron virtual switches(bridges), virtual devices,
namespaces and metadata-agents. After that, metadata agent has no other
functionalities. In large-scale scenarios, a large number of deployed
metadata agents will cause a waste of resources for hosts and message queue.
Because they will start tons of external processes based on the number of
users' resources, report state to Neutron server for heartbeat keepalive.

How many metadata-agent should run for a large scale cloud deployment?
There is no exact answer to this question. Cloud operators may setup
metadata agent to all hosts, something like DHCP agent. Config drive can be
an alternative for clouds to supply metadata for VMs. But what if users do
not want to add a cd-rom device to VM?

So, I'd like to request for implementing an agent extension for Neutron
openvswitch agent to make the metadata datapath distributed.

Tags: rfe-approved
Revision history for this message
LIU Yulong (dragon889) wrote :

As we can see, the metadata datapath is very long via many devices, namespaces and agents. One metadata path, such as agent down or external process die, goes down will not only influence the host, but also all related hosts that will boot new VMs on.

Akihiro Motoki (amotoki)
tags: added: rfe
Changed in neutron:
importance: Undecided → Wishlist
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

AFAIK Metadata agent translates IP address and network/router_id to the device_id (instance id in nova's db) and pass request to Nova.
If that would be done as distributed and by ovs agent then we wouldn't probably need to ask neutron server about device_id every time as we already have it in external_ids in ovs IIRC. And that's good.
But who would pass requests to the nova? Do You want to spawn another process on each compute? Or ovs agent directly will do that? Can You elaborate more about how You want to implement that on computes?

Revision history for this message
LIU Yulong (dragon889) wrote (last edit ):

For VMs which try to access metadata via 169.254.169.254:80 will be send to a new ovs-bridge, called br-meta. In this br-meta, the VM's fixed_IP + MAC (source) will be translated to a local META_IP + MAC, while the (dest) 169.254.169.254 is translated to local META_gateway IP which resides in tap-meta. Then send packets to a tap-meta device which resides in br-meta. A host only haproxy which listen the port 80 via the device tap-meta and then trasimit the request to nova-metadata-api.
For packets back to VM, do the translation of META_IP + MAC to VM fixed_IP + MAC.
The ARP resonder flows will be added in br-meta to complete the ARP reqeust from tap-mata for META_IPs.

The META_IP range is shared for all VMs in one host, while META_IP range is also same in different hosts because IPs in this range will not go outside of one host.

The host haproxy will add HTTP headers to the metadata request which is needed for metadata API. The headers have a fixed algorithm which is easily to assemble. For each VM's request, haproxy will add an independent backend and a match rule of checking the source IP (aka META_IP). While the request from one VM's (META_IP) it will be send to the matched backend, which add HTTP headers and then send to real nova-metadata-api.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I think this can be discussed during next drivers meeting.

tags: added: rfe-triaged
removed: rfe
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

We discussed about that RFE on today's drivers meeting and we agreed to approve it as an idea. Now we will need detailed spec with diagrams of how this will be implemented on the nodes.

tags: added: rfe-approved
removed: rfe-triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron-specs (master)
Changed in neutron:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron-specs (master)

Reviewed: https://review.opendev.org/c/openstack/neutron-specs/+/802854
Committed: https://opendev.org/openstack/neutron-specs/commit/ebaa98925010b666d10ed64a116888a77e364790
Submitter: "Zuul (22348)"
Branch: master

commit ebaa98925010b666d10ed64a116888a77e364790
Author: LIU Yulong <email address hidden>
Date: Thu Jul 29 17:40:44 2021 +0800

    Spec for distributed datapath for metadata

    Related-Bug: #1933222
    Change-Id: Ice457e4ead492d3d128017a1bb551d482658ade5

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.