Neutron metadata service returns http code 500 if nova metadata service is down

Bug #2059032 reported by Anton Kurbatov
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Anton Kurbatov

Bug Description

We discovered that if the nova metadata service is down, then the neutron metadata service starts printing stack traces with a 500 HTTP code to the user.

Demo on a newly installed devstack

$ systemctl stop <email address hidden>

Then inside a VM:

$ curl http://169.254.169.254/latest/meta-data/hostname
<html>
 <head>
  <title>500 Internal Server Error</title>
 </head>
 <body>
  <h1>500 Internal Server Error</h1>
  An unknown error has occurred. Please try your request again.<br /><br />
 </body>
</html>$

Stack trace:

ERROR neutron.agent.metadata.agent Traceback (most recent call last):
ERROR neutron.agent.metadata.agent File "/opt/stack/neutron/neutron/agent/metadata/agent.py", line 85, in __call__
ERROR neutron.agent.metadata.agent res = self._proxy_request(instance_id, tenant_id, req)
ERROR neutron.agent.metadata.agent File "/opt/stack/neutron/neutron/agent/metadata/agent.py", line 249, in _proxy_request
ERROR neutron.agent.metadata.agent resp = requests.request(method=req.method, url=url,
ERROR neutron.agent.metadata.agent File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 59, in request
ERROR neutron.agent.metadata.agent return session.request(method=method, url=url, **kwargs)
ERROR neutron.agent.metadata.agent File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
ERROR neutron.agent.metadata.agent resp = self.send(prep, **send_kwargs)
ERROR neutron.agent.metadata.agent File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
ERROR neutron.agent.metadata.agent r = adapter.send(request, **kwargs)
ERROR neutron.agent.metadata.agent File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 519, in send
ERROR neutron.agent.metadata.agent raise ConnectionError(e, request=request)
ERROR neutron.agent.metadata.agent requests.exceptions.ConnectionError: HTTPConnectionPool(host='10.136.16.184', port=8775): Max retries exceeded with url: /latest/meta-data/hostname (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3ce93f38b0>: Failed to establish a new connection: [Errno 111] ECONNREFUSED'))
ERROR neutron.agent.metadata.agent
INFO eventlet.wsgi.server [-] ::ffff:192.168.100.14,<local> "GET /latest/meta-data/hostname HTTP/1.1" status: 500 len: 362 time: 0.1392403

Also, in our installation the nova service is behind nginx. And if we stop nova metadata service we also get 500 http code but with another traceback:

2024-03-25 20:27:01.985 24 ERROR neutron.agent.metadata.agent [-] Unexpected error.: Exception: Unexpected response code: 502
2024-03-25 20:27:01.985 24 ERROR neutron.agent.metadata.agent Traceback (most recent call last):
2024-03-25 20:27:01.985 24 ERROR neutron.agent.metadata.agent File "/usr/lib/python3.6/site-packages/neutron/agent/metadata/agent.py", line 93, in __call__
2024-03-25 20:27:01.985 24 ERROR neutron.agent.metadata.agent res = self._proxy_request(instance_id, tenant_id, req)
2024-03-25 20:27:01.985 24 ERROR neutron.agent.metadata.agent File "/usr/lib/python3.6/site-packages/neutron/agent/metadata/agent.py", line 288, in _proxy_request
2024-03-25 20:27:01.985 24 ERROR neutron.agent.metadata.agent resp.status_code)
2024-03-25 20:27:01.985 24 ERROR neutron.agent.metadata.agent Exception: Unexpected response code: 502
2024-03-25 20:27:01.985 24 ERROR neutron.agent.metadata.agent
2024-03-25 20:27:01.988 24 INFO eventlet.wsgi.server [-] 10.197.115.207,<local> "GET /latest/meta-data/hostname HTTP/1.1" status: 500 len: 362 time: 0.1369441

It seems to me that it is also better to handle nginx-like gateway errors a bit more correctly.

These 500 HTTP codes worries us because we are trying to create an alert system one of the criteria for which is 500 codes.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/914154

Changed in neutron:
status: New → In Progress
Changed in neutron:
importance: Undecided → High
assignee: nobody → Anton Kurbatov (akurbatov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/2024.1)

Fix proposed to branch: stable/2024.1
Review: https://review.opendev.org/c/openstack/neutron/+/914260

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/914154
Committed: https://opendev.org/openstack/neutron/commit/6395b4fe8ed99855853587fa93cb59fd2691aed5
Submitter: "Zuul (22348)"
Branch: master

commit 6395b4fe8ed99855853587fa93cb59fd2691aed5
Author: Anton Kurbatov <email address hidden>
Date: Mon Mar 25 18:49:52 2024 +0000

    Fixing the 500 HTTP code in the metadata service if Nova is down

    If the Nova metadata service is unavailable, the requests.request()
    function may raise a ConnectionError. This results in the upper code
    returning a 500 HTTP status code to the user along with a traceback.
    Let's handle this scenario and instead return a 503 HTTP status code
    (service unavailable).

    If the Nova service is down and is behind another proxy (such as
    Nginx), then instead of a ConnectionError, the request may result in
    receiving a 502 or 503 HTTP status code. Let's also consider this
    situation and add support for an additional 504 code.

    Closes-Bug: #2059032
    Change-Id: I16be18c46a6796224b0793dc385b0ddec01739c4

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/2024.1)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/914260
Committed: https://opendev.org/openstack/neutron/commit/646270de5b740e6bc35f070ababf3e4f14e47f38
Submitter: "Zuul (22348)"
Branch: stable/2024.1

commit 646270de5b740e6bc35f070ababf3e4f14e47f38
Author: Anton Kurbatov <email address hidden>
Date: Mon Mar 25 18:49:52 2024 +0000

    Fixing the 500 HTTP code in the metadata service if Nova is down

    If the Nova metadata service is unavailable, the requests.request()
    function may raise a ConnectionError. This results in the upper code
    returning a 500 HTTP status code to the user along with a traceback.
    Let's handle this scenario and instead return a 503 HTTP status code
    (service unavailable).

    If the Nova service is down and is behind another proxy (such as
    Nginx), then instead of a ConnectionError, the request may result in
    receiving a 502 or 503 HTTP status code. Let's also consider this
    situation and add support for an additional 504 code.

    Closes-Bug: #2059032
    Change-Id: I16be18c46a6796224b0793dc385b0ddec01739c4
    (cherry picked from commit 6395b4fe8ed99855853587fa93cb59fd2691aed5)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.