console-log return 500 if n-cpu is shutdown

Bug #1735329 reported by hongbin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Triaged
Low
Unassigned
Pike
Confirmed
Low
Unassigned

Bug Description

Description
===========
If a compute host fails (i.e. the n-cpu process is down), request console-log will fail with 500

Steps to reproduce
==================

$ sudo systemctl stop devstack@n-cpu
$ nova console-log test2
ERROR (ClientException): Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'oslo_messaging.exceptions.MessagingTimeout'> (HTTP 500) (Request-ID: req-e29f2aa4-1cf2-4db9-b327-f5a3ef6d7f6f)

Expected result
===============
Nova should response 4xx

Actual result
=============
Nova responded 500

Environment
===========
1. Exact version of OpenStack you are running.
Devstack

2. Which hypervisor did you use?
   Libvirt + KVM

2. Which storage type did you use?
   LVM

3. Which networking type did you use?
   OVS

hongbin (hongbin034)
Changed in nova:
assignee: nobody → hongbin (hongbin034)
Revision history for this message
Takashi Natsume (natsume-takashi) wrote :

Set the status to 'In-Progress' because this report has an assignee.

Changed in nova:
status: New → In Progress
Revision history for this message
hongbin (hongbin034) wrote :

Awaiting for someone to triage this report as I am not sure if this is a valid bug.

Changed in nova:
status: In Progress → New
Revision history for this message
Matt Riedemann (mriedem) wrote :

The API should return a 409 in this case I think. It already does return a 409 in other cases, so I think that's a good fit.

Revision history for this message
Matt Riedemann (mriedem) wrote :

I'm not sure what the default http response timeout is for apache or uwsgi, but keep in mind that the default rpc_response_timeout is 60 seconds, so if we wait for the MessagingTimeout to handle this, we might have already timed out the response to the actual client.

https://stackoverflow.com/questions/24127601/uwsgi-request-timeout-in-python

Changed in nova:
status: New → Triaged
importance: Undecided → Low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/525335

Changed in nova:
status: Triaged → In Progress
Revision history for this message
hongbin (hongbin034) wrote :

Hi Matt,

After looking into this case, I think raising a 5xx error is better since the error is due to server failure. I proposes to use 503: https://review.openstack.org/#/c/525335/ . What do you think?

Regarding to the uwsgi default timeout and its potential conflict with the RPC timeout, I couldn't think of a perfect solution. Perhaps, what we can do is to check if the nova-compute service is down and fail the earlier to avoid the long RPC timeout.

Changed in nova:
assignee: hongbin (hongbin034) → Hongbin Lu (hongbin.lu)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Balazs Gibizer (<email address hidden>) on branch: master
Review: https://review.opendev.org/525335
Reason: The patch is stale with negative review. Feel free to restore it (or ping gibi on IRC to do so) if you are still working on this.

Changed in nova:
status: In Progress → Triaged
assignee: Hongbin Lu (hongbin.lu) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.