[RFE] Needs to restart metadata proxy with the start/restart of l3/dhcp agent

Bug #1808731 reported by cheng li
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Triaged
Undecided
cheng li

Bug Description

Metadata proxy can be launched by l3 agent or dhcp agent, but it can be stopped by neither. Stopping l3 agent or dhcp agent doesn't stop metadata proxy. Seems there is no way to stop metadata proxy except for sending kill signal manually to metadata proxy process.
This will prevent metadata proxy upgrade. As the old process is not killed, the new process will not get started.

Version: latest devstack on Ubuntu 16.04
Environment: ovs as mechanism_drivers

Reproduction Steps:
1. stop l3 agent by `systemctl stop <email address hidden>`. I can see the metadata proxy service is still running
2. start l3 agent. The metadata proxy process doesn't change.

I would propose to stop metadata proxy with the stop of l3 agent or dhcp agent.

cheng li (chengli3)
Changed in neutron:
assignee: nobody → cheng li (chengli3)
cheng li (chengli3)
description: updated
cheng li (chengli3)
description: updated
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I'm not sure if that is good idea because such stop of metadata proxy process will cause data plane interruption as metadata for instances will not be available. IMO this shouldn't happen at any time.
Maybe we should clearly document that when You want to upgrade metadata proxy, You should kill existing processes and then restart l3/dhcp agents.

Revision history for this message
cheng li (chengli3) wrote :

I would prefer to keep upgrade steps as simple as possible. I had heard some Openstack users complain about Openstack upgrade.

To my knowledge, stop of metadata proxy affects VM creation. In fact, stop of l3/dhcp agent also affect VM creation.
Stop l3/dhcp agent but keep metadata proxy running, can we achieve benefits from it?

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

1. If You spawn new vm with fixed IP address only, You don't need to have working L3 agent but You still need to have working metadata service to make it boot properly.

2. What about other processes, like dnsmasq, keepalived, etc. which are also spawned in same way as metadata proxy by DHCP or L3 agents? Do You want to treat them in same way which can cause data plane break too, or do You want to treat them differently?

3. I think that this should be handled as RFE and discussed on drivers meeting. I'm personally not a fan of such idea but maybe others will have different opinions :)

tags: added: rfe
summary: - Needs the method to stop metadata proxy
+ [RFE] Needs the method to stop metadata proxy
Revision history for this message
Brian Haley (brian-haley) wrote : Re: [RFE] Needs the method to stop metadata proxy

I don't think the metadata proxy should be stopped when the l3-agent is stopped. Currently when the agent is stopped there is no data plane disruption - namespaces, IP addresses, floating IPs, etc are all left intact, which was intentional, since destroying everything and rebuilding (for example on a restart) could take a long time. We leave things alone and the agent will eventually converge at the end of its full-sync. Stopping the proxy breaks that, and so something like a soft reboot of a VM is no longer possible if the l3-agent is stopped since it will fail to get metadata.

The proxy will be stopped when the router is removed from the agent, is there a reason that is not enough? Is it just due to wanting to update the haproxy package as the proxy itself is really just that.

Revision history for this message
cheng li (chengli3) wrote :

As Slawek mentions other processes, like dnsmasq. I make some tests of dnsmasq process.
1. stop dhcp agent. The dnsmasq is not stopped
2. start dhcp agent. The dnsmasq process is restarted.

I think this is what we need for metadata proxy. haproxy process needs to be restarted with the start/restart of l3/dhcp agent. Because if haproxy is not restarted with start/restart of l3/dhcp agent, then changed configurations will not take effect for metadata proxy.
For example, we change the metadata_port from default 9697 to 9699, then restart l3 agent and dhcp agent. With current implement, the metadata proxy is not restarted, the old 9697 port is used for metadata proxy, this blocks VM getting metadata.

So seems we can make the metadata proxy to restart like the dnsmasq process - not stop with the stop of l3/dhcp agent. But *restart* with the start/restart of l3/dhcpagent.

Revision history for this message
Pawel Suder (pasuder) wrote :

IMO that issue seems to be related to specific infra/deployment.

I have some questions:

- does neutron handle system unit/scripts for starting services?
- how neutron is installed? package, python venv, container, etc?
- do you provide deployment mechanism by your own or is it provided by some vendor?

I am not sure if neutron itself has possibility to start some service from code. What is more, I guess that neutron manages units only on devstack. Am I right?

Revision history for this message
Pawel Suder (pasuder) wrote :

OK, I read first message one more time: Version: latest devstack on Ubuntu 16.04.

In this case, your are in right place. Please skip my questions from my previous answer.

Revision history for this message
Pawel Suder (pasuder) wrote :
Revision history for this message
cheng li (chengli3) wrote :

okay, let me update the title

summary: - [RFE] Needs the method to stop metadata proxy
+ [RFE] Needs to restart metadata proxy with the start/restart of l3/dhcp
+ agent
Pawel Suder (pasuder)
Changed in neutron:
status: New → Triaged
Revision history for this message
Miguel Lavalle (minsel) wrote :

I am pretty sure that for the vast majority of users, the main concern (and assumption) is not to disrupt the data plane when the control plane is re-started. Changing this assumption is highly risky so I don't think we should change it. The upgrading of base software in a deployment's hosts should be part of on-going admin processes where hosts are vacated, upgraded and then populated again, which are beyond the scope of Neutron. But I don't think we should break the promise of Neutron not disrupting the data plane on its own.

Revision history for this message
cheng li (chengli3) wrote :

It's not only upgrade, but also neutron config update. As I said in comment #5, if we don't restart metadata proxy. The changed configuration will not take effect.

Revision history for this message
Miguel Lavalle (minsel) wrote :

In today's drivers meeting it became clear that we don't have an actual use case to justify the changed proposed in this RFE. Submitter agreed to start a thread in the mailing list to identify operators who might see a benefit to the proposed change

Revision history for this message
Miguel Lavalle (minsel) wrote :

Marking this RFE as postponed until we hear again from submitter

tags: added: rfe-postponed
removed: rfe
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.