restore-storage-configuration fails when a raid device is part of a bcache

Bug #1815091 reported by Jason Hobbs
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Blake Rouse
2.5
Fix Released
Critical
Blake Rouse

Bug Description

This is with maas 2.5.0.

To reproduce, configure a raid device for a machine, and then make that raid device part of a bcache (in my test case, I made it the backing device).

Then use the 'restore-storage-configuration' API call to try to restore the storage configuration to default. It will fail, with an error:

ubuntu@dratini:~$ maas root machine restore-storage-configuration drt3nq
Cannot delete block device because its part of a Bcache.

The expected behavior is that maas will restore the configuration regardless of its current state.

Related branches

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

The only workaround I'm aware of for this is to re-implement restore-storage-configuration outside of MAAS through API calls. That is a huge complex mess because we have to be very careful about the order in which storage devices are removed, and it is extremely slow, doubling the time of reconfiguring nodes in maas.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

sub'd to field-high

tags: added: cdo-qa
Revision history for this message
Blake Rouse (blake-rouse) wrote :

This is most likely due to the nested level of the virtual block devices. Can you provide a stack trace of the error so I can see exactly where in the code it is occurring?

Changed in maas:
status: New → Incomplete
importance: Undecided → Critical
assignee: nobody → Blake Rouse (blake-rouse)
milestone: none → 2.6.0
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

There is no traceback, just this:

2019-02-08 00:31:02 regiond: [info] 10.245.208.5 POST /MAAS/api/2.0/machines/drt3nq/?op=restore_storage_configuration HTTP/1.1 --> 400 BAD_REQUEST (referrer: -; agent: Python-httplib2/0.9.2 (gzip))

But you can reproduce this anywhere, so it shouldn't be hard to reproduce on your side.

Changed in maas:
status: Incomplete → New
Revision history for this message
Blake Rouse (blake-rouse) wrote :

Strange that it shows as a 400 error. I thought it would be a 500 error that is why I asked for the trace. Will reproduce manually, thanks!

Changed in maas:
status: New → Triaged
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Setting back as incomplete until this can be reproduced!

Changed in maas:
status: Triaged → Incomplete
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Do you need anything else from me? I've provided complete directions on how to reproduce. Have you attempted to reproduce and failed?

Changed in maas:
status: Incomplete → New
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Set this back to "New" as "Reporter needs to provide more information" is inaccurate unless you actually need more information from me.

Revision history for this message
Andres Rodriguez (andreserl) wrote : Re: [Bug 1815091] Re: restore-storage-configuration fails when a raid device is part of a bcache

Hi Jason,

Sorry I wasn’t clear, just setting it to incomplete until Blake can
reproduce locally and confirm your reproduction steps yield the same result
for him.

He should mark this confirmed once he is able to confirm.

On Thu, Feb 14, 2019 at 10:25 AM Jason Hobbs <email address hidden>
wrote:

> Do you need anything else from me? I've provided complete directions on
> how to reproduce. Have you attempted to reproduce and failed?
>
> ** Changed in: maas
> Status: Incomplete => New
>
> ** Changed in: maas/2.5
> Status: Incomplete => New
>
> --
> You received this bug notification because you are subscribed to MAAS.
> https://bugs.launchpad.net/bugs/1815091
>
> Title:
> restore-storage-configuration fails when a raid device is part of a
> bcache
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1815091/+subscriptions
>
> Launchpad-Notification-Type: bug
> Launchpad-Bug: product=maas; milestone=2.6.0; status=New;
> importance=Critical; <email address hidden>;
> Launchpad-Bug: product=maas; productseries=2.5; milestone=2.5.2;
> status=New; importance=Critical; <email address hidden>;
> Launchpad-Bug-Tags: cdo-qa cpe-onsite foundations-engine
> Launchpad-Bug-Information-Type: Public
> Launchpad-Bug-Private: no
> Launchpad-Bug-Security-Vulnerability: no
> Launchpad-Bug-Commenters: andreserl blake-rouse jason-hobbs
> Launchpad-Bug-Reporter: Jason Hobbs (jason-hobbs)
> Launchpad-Bug-Modifier: Jason Hobbs (jason-hobbs)
> Launchpad-Message-Rationale: Subscriber (MAAS)
> Launchpad-Message-For: andreserl
>
--
Andres Rodriguez (RoAkSoAx)
Ubuntu Server Developer
MSc. Telecom & Networking
Systems Engineer

Changed in maas:
status: New → In Progress
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
milestone: 2.6.0 → 2.6.0alpha1
Changed in maas:
status: Fix Committed → Fix Released
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Tried this on 2.5.2 - MAAS returns 200 and the configs remain as they were.

https://pastebin.canonical.com/p/QpQKdCnQ3W/

Versions (2.5.2 from -proposed ppa on all 3 nodes):
https://pastebin.canonical.com/p/BczQgM6wK6/

 maas root machine restore-storage-configuration sgtp6b | grep bcache
                    "fstype": "bcache-backing",
    "bcaches": [
                        "fstype": "bcache-backing",
                "fstype": "bcache-backing",
                "fstype": "bcache-backing",
                "fstype": "bcache-backing",
                "fstype": "bcache-backing",
                "fstype": "bcache-backing",
                "fstype": "bcache-cache",
                "fstype": "bcache-cache",
                        "fstype": "bcache-backing",
                "fstype": "bcache-backing",
                "fstype": "bcache-backing",
                "fstype": "bcache-backing",
                "fstype": "bcache-backing",
                "fstype": "bcache-backing",
                "fstype": "bcache-cache",
                "fstype": "bcache-cache",

2019-03-20 11:04:24 regiond: [info] 192.0.2.18 POST /MAAS/api/2.0/machines/sgtp6b/?op=restore_storage_configuration HTTP/1.1 --> 200 OK (referrer: -; agent: Python-
httplib2/0.9.2 (gzip))
2019-03-20 11:04:25 regiond: [info] 192.0.2.18 POST /MAAS/api/2.0/machines/ambatm/?op=restore_storage_configuration HTTP/1.1 --> 200 OK (referrer: -; agent: Python-
httplib2/0.9.2 (gzip))
2019-03-20 11:04:26 regiond: [info] 192.0.2.18 GET /MAAS/api/2.0/machines/?hostname=control-7&domain=maas HTTP/1.1 --> 200 OK (referrer: -; agent: Python-httplib2/0
.9.2 (gzip))
2019-03-20 11:04:26 regiond: [info] 192.0.2.18 GET /MAAS/api/2.0/nodes/sgtp6b/blockdevices/ HTTP/1.1 --> 200 OK (referrer: -; agent: Python-httplib2/0.9.2 (gzip))
2019-03-20 11:04:28 regiond: [info] 192.0.2.18 GET /MAAS/api/2.0/machines/?hostname=compute-3&domain=maas HTTP/1.1 --> 200 OK (referrer: -; agent: Python-httplib2/0

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Code path:

https://paste.ubuntu.com/p/xHpvdvRXGP/

So MAAS clears the storage by applying a default storage layout.

Tracing shows that self.skip_storage is set to True (I used that option during re-commissioning once or twice but it should not affect this operation):

https://paste.ubuntu.com/p/sDS2xjcn3k/

3214 -> if self.skip_storage:
3215 return
3216 storage_layout = Config.objects.get_config("default_storage_layout")

(Pdb) n
> /usr/lib/python3/dist-packages/maasserver/models/node.py(3215)set_default_storage_layout()
-> return

So MAAS simply returns 200 skipping storage reconfiguration.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Filed a separate bug for #10 and #11:

https://bugs.launchpad.net/maas/+bug/1820998

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.