Cinder manage now requires a pool name, which administrators have no way to discover

Bug #1364279 reported by Navneet
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Low
Huang Zhiteng

Bug Description

The new cinder pools submission breaks any cinder operation that involves <host> as input parameter. I tested cinder manage/unmanage feature taking the latest code. cinder manage breaks if host not given in the form host@backend#pool. For eg the following breaks:

cinder manage --source-name vol1 openstack1@iscsi

but this succeeds

cinder manage --source-name vol1 openstack1@iscsi#pool1

The knowledge of pool is only limited to the backend and hence any operation that involves <host> as input should not be burdened to also include pool name with it. This should be looked into at high priority.

Navneet (singn)
tags: removed: netapp
Rushi Agrawal (rushiagr)
Changed in cinder:
status: New → Confirmed
Revision history for this message
Rushi Agrawal (rushiagr) wrote :

I'm able to reproduce this behavior with DevStack-LVM setting.

Only something like

$ cinder manage --source-name volume-a618c57e-acde-40a3-bda9-cc12cbef0758 ra@lvmdriver-1#lvmdriver-1

succeeds. If the part starting from '#' is not specified, the volume appears in the 'cinder list', but is in an 'error' state. And in the scheduler log, it says:

2014-09-04 20:58:41.075 ERROR cinder.scheduler.manager [req-f55acb5b-58db-4805-9fd2-0e22ef5d2b6e ddd702b542b44d30a871abe64eabd781 a9c3b3cc2f554ef88115b2262f2e393c] Failed to schedule_manage_existing: No valid host was found. Cannot place volume ecc02267-c3ba-4d42-8323-1ccd14c10bb2 on ra@lvmdriver-1

Another thing I found (maybe it constitutes another bug) is when the host is not in proper format (<hostname>@<backend-name>), or if there is a typo in specifying the host, the response error message is 'Service not found'. The word 'service' is slightly confusing IMO. 'Backend not found', or 'host not found' would make more sense...

Revision history for this message
Huang Zhiteng (zhiteng-huang) wrote :

Navneet and Rushi, this is expected behavor. I actually mentioned this in the commit message for the pool-aware scheduler patch (https://review.openstack.org/#/c/98715/29, check out the 'User Visible Change' session at the end of commit message).

Changed in cinder:
status: Confirmed → Invalid
Revision history for this message
Ben Swartzlander (bswartz) wrote :

I think this is a valid bug. While the change to the operation of the migrate feature may be intended, the fact that there's no way for an administrator to obtain a list of pools means that administrators can't use the cinder migrate feature in practice unless they know the pool names using some out-of-band mechanism such as reading log files or knowing the details of the storage controller.

I see 3 ways to address this problem
1) We could add a new admin-only API to list the pools for Juno-RC1
2) We could change the way migrate works so that it accepts the old format and involves the scheduler to somehow pick one of the pools on the specified backend
3) We can leave it broken until Kilo and add the pool list API then

(1) seems like the least evil choice to me, because (2) is a major change in functionality and (3) sucks for administrators

Changed in cinder:
status: Invalid → Confirmed
summary: - Cinder pools breaks operations involving host
+ Cinder manage now requires a pool name, which administrators have no way
+ to discover
Revision history for this message
Navneet (singn) wrote :

In my honest opinion if I was an operator and not a cinder developer, I would say its not easy and an unnecessary tyrany on admins.

Its a bug because:

1. It breaks the already working cinder calls requiring <host> and hence functional regression for operations like manage, migrate etc etc. It does this even for backends not supporting pools.
2. I have affected calls for otherwise stable apis without any notice to devs and operators.
3. I am expecting operators to read OpenStack document and also each vendor's section to see if they support pools or not and if he does how does he map pool name to backend container name.
4. I am also expecting them to read information from logs to figure out pool, this is way too much dev centric activity.
5. If I do not use cmdline cinder client and very much fond of using UI such as horizon or operator's custom UI then I cannot just choose host from a list in the drop down. I need internal understanding and knowledge of pools and type that in, may be future text field, to supply a right host value.
6. If I am using cinder api and have built my own client for managing my cloud environment the above point even complicates it further as I have no programmable way of finding pool and then stiching that value to <host>.

At the very minimum there should be a way to get list of pools in the current approach and show them to admins or return them in an api call. Though this is not sufficient to clean the mess but atleast some rebate for the admins.

During my discussions with the community and core members while making decision to select implementation approach to pools, highlighted that there needs to be some approach/api to list and manage/control pools within a backend. I think now its time we take the suggestion seriously and atleast start discussions on the same with pros/cons.

Revision history for this message
John Griffith (john-griffith) wrote :

I'd agree we need to fix this up, especially in the case of items like LVM that don't even use Pools. Two points from my opinion:
1. We should be able to just ignore pool designation for devices that don't have/support pools
2. We most certainly should be able to give the admin a method for discovering pools

This is where my concerns regarding late merge of large features like this come into play. I think the feature impl that was done here is great work, but given the timing of it's merge we now have to scramble for all the vendors to implement and start finding the corner cases that we missed in the original implementation. Not a fun place to be IMO.

Revision history for this message
Ben Swartzlander (bswartz) wrote :

I think the issue with merging features late and having time to deal with issues is a legit one, but that's why the feature freeze cutoff is milestone-3 and not milestone-4. We've got a couple of weeks to sort out this bug so it's not like an emergency situation.

Revision history for this message
Walt Boring (walter-boring) wrote :

I would hope this type of thing would have been caught in the gate. Unfortunately, tempest doesn't use the command line clients to test anything. Tempest only uses the cinder API directly, so there is a very large gap in automated testing that we are exposed to, and this defect shows it.

Revision history for this message
Ben Swartzlander (bswartz) wrote :

Not only does tempest not test migration, but tempest doesn't test anything that requires multiple backends. The existing automated test systems only catch a few very minor classes of bugs. We need dramatically expanded automated testing capabilities -- probably more than what's possible with tempest. I have ideas I might mention in Paris...

Changed in cinder:
assignee: nobody → Huang Zhiteng (zhiteng-huang)
status: Confirmed → In Progress
Changed in cinder:
importance: Undecided → Low
Revision history for this message
John Griffith (john-griffith) wrote :

This isn't the place for this disucssion, but since it's been started already I'd like to add some notes. I'd also like to ask that any more of this sort of comment/discussion thread take place on the ML or at least in IRC.

False statements:
>> Unfortunately, tempest doesn't use the command line clients to test anything. Tempest only uses the cinder API directly

Not true at all, infact I believe almost everything is converted such that it uses the clients

>> but tempest doesn't test anything that requires multiple backends.

there are tests in Tempest that exercise multi-backend, particularly type mapping etc. These require multi-backends being configured of course, which a while back we were doing, but have since turned back off. I don't recall the reason for this but can look into it.

>> We need to dramatically expand automated testing capabilities

Yes please... why wait for Paris? If you have ideas propose them, if you have code submit it to Tempest... no reason to wait when it comes to testing IMO

>>I think the issue with merging features late and having time to deal with issues is a legit one, but that's why the feature freeze cutoff is milestone-3 and not milestone-4.

Except that NOW instead of dealing with issues that have been around for a while, we're spending time and effort on new problems we've introduced. If you introduce 5 new bugs for every 1 you fix you're never really making forward progress

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/119938
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=80b14a54710dad659fde3e319f8b0cf0e673acdd
Submitter: Jenkins
Branch: master

commit 80b14a54710dad659fde3e319f8b0cf0e673acdd
Author: Zhiteng Huang <email address hidden>
Date: Mon Sep 8 14:42:39 2014 -0700

    Allow scheduler pool information to be retrieved

    With pool support added to Cinder, now we are kind of in an awkward
    situation where we require admin to input exact location for volumes
    to-be managed (imported) or migrated, which must have pool info, but
    there is no way to find out what pools are there for backends except
    looking at the scheduler log. That causes bad user experience, and
    thus is a bug from UX POV.

    This change simply adds a new admin-api extension to allow admin to
    fetch all the pool information from scheduler cache (memory), which
    closes the gap for end users.

    This extension provides two level of pool information: names only or
    detailed information:

    Pool name only:
    GET http://CINDER_API_ENDPOINT/v2/TENANT_ID/scheduler-stats/get_pools

    Detailed Pool info:
    GET http://CINDER_API_ENDPOINT/v2/TENANT_ID/scheduler-stats/get_pools
    \?detail\=True

    Closes-bug: #1364279

    Change-Id: I445d4e472c83db2f2d8db414de139c87d09f8fda

Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in cinder:
milestone: none → juno-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in cinder:
milestone: juno-rc1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.