metadata-proxy process stops listening on port 80

Bug #1843801 reported by Mithil Arun
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Expired
Undecided
Unassigned

Bug Description

I'm running a metadata agent on provider network and I see that the metadata service stops listening on port 80 randomly.

I see that the process itself is running, but port 80 is not open in the DHCP namespace. There are no logs in neutron-server, neutron-metadata-agent, neutron-dhcp-agent or journalctl.

The only way to recover is to kill ns-metadata-proxy and have neutron-metadata-agent restart it at which point, the port is up.

In addition to monitoring the process itself, neutron-metadata-agent must watch for port 80 in the namespace as well.

ENV: Ubuntu 16.04 running neutron rocky.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Neutron spawns haproxy process which should listen on port 80. Can You check in journal.log if there is anything related to this haproxy there?

Revision history for this message
Brian Haley (brian-haley) wrote :

I'm not sure there is a bug here, but wanted to correct some of the above information and ask some questions.

With provider networks with no l3-agent, the dhcp-agent will start a metadata proxy in it's namespace. This is enabled in dhcp_agent.ini by setting enable_isolated_metadata = True.

In Rocky, this namespace proxy should be haproxy, for example, from ps output:

  haproxy -f /opt/stack/data/neutron/ns-metadata-proxy/b4d7444e-9549-4156-8596-09eb8b81253a.conf

That process should log to /var/log/haproxy.log* or /var/log/syslog depending on the setup.

It connects to the metadata-agent via a Unix Domain Socket from inside the namespace.

The dhcp-agent is responsible for monitoring the process, but will only deal with restarting when the process exits, it doesn't do wellness checks on port 80.

The haproxy process does not listen on port 80 however, it binds and listens on 0.0.0.0:9697 and an iptables redirect rule is added from port 80 to that. For example, in a router namespace there is:

-A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697

So could you provide more information like:

1) Is the iptables redirect rule present?
2) Is the haproxy process running and listening on port 9697?
3) Is there anything in /var/log/* regarding haproxy?

Changed in neutron:
status: New → Incomplete
Revision history for this message
Mithil Arun (arun-mithil) wrote :

Thanks for looking at this!

My bad, I meant to say haproxy. I see that the process is running:
--snip--
ops 19090 1 0 Sep09 ? 00:00:06 haproxy -f /var/opt/ops/neutron/ns-metadata-proxy/ead88ed3-f1e0-4498-8c1e-6d091083ae33.conf
--snip--

I don't see any process listening on port 9697:
--snip--
root@kvm03:/var/log# netstat -nltp | grep 9697
root@kvm03:/var/log#
--snip--

Nor are there any iptables rules present in the namespace:
--snip--
root@kvm03:/var/log# ip netns exec qdhcp-ead88ed3-f1e0-4498-8c1e-6d091083ae33 iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
neutron-dhcpd-INPUT all -- anywhere anywhere

Chain FORWARD (policy ACCEPT)
target prot opt source destination
neutron-filter-top all -- anywhere anywhere
neutron-dhcpd-FORWARD all -- anywhere anywhere

Chain OUTPUT (policy ACCEPT)
target prot opt source destination
neutron-filter-top all -- anywhere anywhere
neutron-dhcpd-OUTPUT all -- anywhere anywhere

Chain neutron-dhcpd-FORWARD (1 references)
target prot opt source destination

Chain neutron-dhcpd-INPUT (1 references)
target prot opt source destination

Chain neutron-dhcpd-OUTPUT (1 references)
target prot opt source destination

Chain neutron-dhcpd-local (1 references)
target prot opt source destination

Chain neutron-filter-top (2 references)
target prot opt source destination
neutron-dhcpd-local all -- anywhere anywhere
--snip--

The port is open in the DHCP namespace though:
--snip--
root@kvm03:/var/log# ip netns exec qdhcp-ead88ed3-f1e0-4498-8c1e-6d091083ae33 netstat -nltp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 169.254.169.254:80 0.0.0.0:* LISTEN 19090/haproxy
tcp 0 0 127.0.0.1:53 0.0.0.0:* LISTEN 30720/dnsmasq
tcp 0 0 10.128.144.23:53 0.0.0.0:* LISTEN 30720/dnsmasq
tcp 0 0 169.254.169.254:53 0.0.0.0:* LISTEN 30720/dnsmasq
tcp6 0 0 ::1:53 :::* LISTEN 30720/dnsmasq
tcp6 0 0 fe80::f816:3eff:febb:53 :::* LISTEN 30720/dnsmasq
--snip--

And I can reach the metadata service from a VM:
--snip--
$ curl http://169.254.169.254/latest/meta-data/instance-id
i-000072f5$
--snip--

Occasionally though (once a week, on average), I see that there is no process listening to port 80 in the namespace as shown above.

No relevant logs in /var/log/haproxy.log* or in /var/log/syslog.

Revision history for this message
Brian Haley (brian-haley) wrote :

Maybe I'm so used to debugging it in the qrouter namespace I got some of the details backwards, but where haproxy is listening would be in the conf file it was started with.

So in your case when you see nothing listening on port 80 is there an haproxy process running?

Revision history for this message
Mithil Arun (arun-mithil) wrote :

That's right. The process continues to run. I know the process belongs to this DHCP namespace because it uses the conf file that has the same network ID in its name.

We've seen this in the past before we upgraded to rocky. That was before haproxy was used. I was hoping moving to haproxy would fix things, but that didn't.

Revision history for this message
Brian Haley (brian-haley) wrote :

Thanks for the info.

I'm just not sure how we'd fix this since the process monitor we run is only intended to catch cases where a daemon exits, not when it's misbehaving.

Can you take a look at this discussion and make sure you have a version of haproxy >= 1.8.15 ? Seems someone else had a similar problem with it?

https://discourse.haproxy.org/t/haproxy-stops-serving-frontend-requests-while-not-closing-backend-connections-at-100-cpu-utilisation/3292/8

Revision history for this message
Mithil Arun (arun-mithil) wrote :

Ah! I'm running 1.6.3. Let me upgrade to >= 1.8.15 and check again. Will revert back here if that doesn't fix things.

Wondering if it makes sense to add functionality to the process monitor to catch cases where the port isn't up though. It doesn't make much sense for the process to be running if it isn't listening on any port. I can definitely help with the development efforts required there, but I'd like to see if this is something that the community agrees with.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.