placement returns 503 when keystone is down

Bug #1749797 reported by Jim Rollenhagen
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
keystonemiddleware
Fix Released
Undecided
Colleen Murphy

Bug Description

See the logs here: http://logs.openstack.org/50/544750/8/check/ironic-grenade-dsvm-multinode-multitenant/5713fb8/logs/screen-placement-api.txt.gz#_Feb_15_17_58_22_463228

This is during an upgrade while Keystone is down. Placement returns a 503 because it cannot reach keystone.

I'm not sure what the expected behavior should be, but a 503 feels wrong.

Matt Riedemann (mriedem)
tags: added: placement upgrade
Changed in nova:
status: New → Confirmed
Revision history for this message
Chris Dent (cdent) wrote :

Thanks for report thing jroll. For a long time, I had a thing on my to-do list something along the lines of "placement weird when keystone down, investigate" but without enough detail to remember what it was when I went back the old things in the list, so deleted it. It was a note I had made while I was digging in something else and didn't want to get distracted.

The logs show the response is coming from keystonemiddleware.auth_token, which placement uses pretty much directly as a straight up wsgi middleware, not adapting the response in any way.

So if this is a bug, it's a bug there (and I've added the project). However, it is by design:

* https://github.com/openstack/keystonemiddleware/blob/master/keystonemiddleware/auth_token/__init__.py#L776-L781
* https://docs.openstack.org/keystonemiddleware/latest/middlewarearchitecture.html#specification-overview

It's a potentially weird interpretation of 503 ( https://httpstatuses.com/503 ) but not too far off base.

Since keystone is required to do the auth, I'm not really sure what ought to happen here. A more clear message about _which_ service is unavailable might be useful: "Bad response code while validating token: 503: ServiceUnavailable: Service Unavailable (HTTP 503)"

Any 4xx error is suspect because there's nothing wrong with the clients request, other than bad luck about the time they asked it.

What other alternatives are there?

Revision history for this message
Jim Rollenhagen (jim-rollenhagen) wrote :

Ah, gotcha. That makes sense, thanks Chris.

I do think we should make placement's error message better for the client. Currently it is: {"message": "The server is currently unavailable. Please try again at a later time.<br /><br />\n\n\n", "code": "503 Service Unavailable", "title": "Service Unavailable"}. See also the compute logs from the same run: http://logs.openstack.org/50/544750/8/check/ironic-grenade-dsvm-multinode-multitenant/5713fb8/logs/screen-n-cpu.txt.gz#_Feb_15_17_58_22_470073

I'm thinking we can catch webob.exc.HTTPServiceUnavailable from placement's wrapper around the auth_token middleware, and transform the message there. Does that work for you?

Revision history for this message
Jim Rollenhagen (jim-rollenhagen) wrote :

Continued thinking about this, and maybe we should just put a clearer message in the auth_token middleware itself? Then all consumers will get the clear message. Win.

Revision history for this message
Chris Dent (cdent) wrote :

yeah, I think the second option is better because many situations there won't be any wrapper (placement didn't have one until very recently)

Revision history for this message
Chris Dent (cdent) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystonemiddleware (master)

Fix proposed to branch: master
Review: https://review.openstack.org/546108

Changed in keystonemiddleware:
assignee: nobody → Chris Dent (cdent)
status: New → In Progress
Changed in keystonemiddleware:
assignee: Chris Dent (cdent) → Colleen Murphy (krinkle)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to keystonemiddleware (master)

Reviewed: https://review.openstack.org/546108
Committed: https://git.openstack.org/cgit/openstack/keystonemiddleware/commit/?id=d3352ff422db6ba6a5e7bd4f7220af0d97efd0ac
Submitter: Zuul
Branch: master

commit d3352ff422db6ba6a5e7bd4f7220af0d97efd0ac
Author: Chris Dent <email address hidden>
Date: Tue Feb 20 10:31:49 2018 +0000

    Identify the keystone service when raising 503

    When the keystonemiddleware is used directly in the WSGI stack of an
    application, the 503 that is raised when the keystone service errors
    or cannot be reached needs to identify that keystone is the service
    that has failed, otherwise it appears to the client that it is the
    service they are trying to access is down, which is misleading.

    This addresses the problem in the most straightforward way possible:
    the exception that causes the 503 is given a message including the
    word "Keystone".

    The call method in BaseAuthTokenTestCase gains an
    expected_body_string kwarg. If not None, the response body (as
    a six.text_type) is compared with the value.

    Change-Id: Idf211e7bc99139744af232f5ea3ecb4be41551ca
    Closes-Bug: #1747655
    Closes-Bug: #1749797

Changed in keystonemiddleware:
status: In Progress → Fix Released
Chris Dent (cdent)
Changed in nova:
status: Confirmed → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/keystonemiddleware 5.0.0

This issue was fixed in the openstack/keystonemiddleware 5.0.0 release.

Revision history for this message
berkninan (berkninan) wrote :

A 503 Service Unavailable Error is an HTTP response status code indicating that your web server operates properly, but it can't temporarily handle the request at the moment. This error happen for a wide variety of reasons. Normally, this error can be due to a temporary overloading or maintenance being performed on the server and it is resolved after a period of time or once another thread has been released by web-server application. The subsequent points serve as a possible fix, aimed toward resolving the potential root causes.

Reload (Refresh) the page
Scan for Malware
Visiting the website later
Contact server admin

http://net-informations.com/q/mis/503.html

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.