hook for storage: storage already attached - Juju using already mounted disk

Bug #1674148 reported by Ante Karamatić
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Andrew Wilkins
2.1
Fix Released
High
Andrew Wilkins
2.2
Fix Released
High
Andrew Wilkins
MAAS
Fix Released
Critical
Blake Rouse

Bug Description

When deploying Ceph OSD with MAAS provider:

juju deploy cs:xenial/ceph-osd --storage osd-journals=ssd,1 --storage osd-devices=sata,1 osd1

Juju hits bug 1656258, which is mitigated by a machine restart. Once node reboots, osd1 unit log shows:

2017-03-19 19:15:16 ERROR juju.worker.dependency engine.go:547 "uniter" manifold worker returned unexpected error: preparing operation "run storage-attached (osd-journals/34) hook": inappropriate "storage-attached" hook for storage "osd-journals/34": storage already attached
2017-03-19 19:15:19 ERROR juju.worker.uniter agent.go:28 resolver loop error: preparing operation "run storage-attached (osd-journals/34) hook": inappropriate "storage-attached" hook for storage "osd-journals/34": storage already attached

juju list-storage shows:

osd1/0 osd-devices/33 pending
osd1/0 osd-journals/34 /dev/disk/by-dname/sde pending

Machines has two SSD disks and multiple SATA disks, tagged as 'ssd' and 'rotary'. Storage pools were created as:

juju create-storage-pool sata maas tags=rotary
juju create-storage-pool ssd maas tags=ssd

On this machine sde is already used for the operating system.

Shouldn't juju skip disks that are already used?

Related branches

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Ante, seems there's a couple of issues here:
 - we shouldn't be selecting sde
 - storage-attached is running after Juju thinks the storage is provisioned

When we acquire a MAAS node, we specify disk params (size & tags), and MAAS returns a device that matches them. We explicitly include a root disk in there, but don't report it in the Juju model. So either MAAS is giving us back the root disk, or we're doing the mapping between MAAS-reported disks and what Juju asked for incorrectly. I can't immediately see any issues with the code on our end. Are you able to reproduce this? Can you provide the HTTP requests/responses from Juju to MAAS?

As for storage-attached after provisioning... I've not seen that before. If you have an environment where you can repro and I can poke around, that would be most helpful.

Changed in juju:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.2.0
Revision history for this message
Anastasia (anastasia-macmood) wrote :

I am looping in MAAS team.

Changed in juju:
status: Triaged → Incomplete
milestone: 2.2.0 → 2.2-rc1
Revision history for this message
Blake Rouse (blake-rouse) wrote :

For now I am going to mark this as Incomplete, since you have not ruled that its even a MAAS problem. If you identify it as a MAAS issue then please provide the logs and the HTTP requests and responses for the allocate API so we can check if we are providing you with expected results.

Changed in maas:
status: New → Incomplete
Revision history for this message
Ante Karamatić (ivoks) wrote :

Yes, I can reproduce this easily, but next time frame for that test is Thursday/Friday.

Changed in juju:
milestone: 2.2-rc1 → none
Ryan Beisner (1chb1n)
tags: added: uosci
Revision history for this message
Ante Karamatić (ivoks) wrote :
Download full text (4.5 KiB)

Before OSD unit is deployed one can see:

0bfa847e-cb1e-4576-8ff0-8ecb006b3369: machine-0 2017-03-31 01:50:19 INFO juju.worker.diskmanager diskmanager.go:67 block devices changed: [{sda [/dev/disk/by-id/ata-ST91000640NS_9XG9N7Q9 /dev/disk/by-id/wwn-0x5000c50087dc93c8 /dev/disk/by-path/pci-0000:03:00.0-sas-exp0x500605b0000272bf-phy0-lun-0] ata-ST91000640NS_9XG9N7Q9 953869 false } {sdb [/dev/disk/by-id/ata-ST91000640NS_9XG9N822 /dev/disk/by-id/wwn-0x5000c50087dcb65b /dev/disk/by-path/pci-0000:03:00.0-sas-exp0x500605b0000272bf-phy1-lun-0] ata-ST91000640NS_9XG9N822 953869 false } {sdc [/dev/disk/by-id/ata-ST91000640NS_9XG9MXZH /dev/disk/by-id/wwn-0x5000c50087dcbc00 /dev/disk/by-path/pci-0000:03:00.0-sas-exp0x500605b0000272bf-phy2-lun-0] ata-ST91000640NS_9XG9MXZH 953869 false } {sdd [/dev/disk/by-id/ata-ST91000640NS_9XG9D00P /dev/disk/by-id/wwn-0x5000c50087d3502f /dev/disk/by-path/pci-0000:03:00.0-sas-exp0x500605b0000272bf-phy3-lun-0] ata-ST91000640NS_9XG9D00P 953869 false } {sde [/dev/disk/by-dname/sde /dev/disk/by-id/ata-INTEL_SSDSC2BX480G4_BTHC63800AYB480MGN /dev/disk/by-id/wwn-0x55cd2e404c356cbe /dev/disk/by-path/pci-0000:03:00.0-sas-exp0x500605b0000272bf-phy4-lun-0] ata-INTEL_SSDSC2BX480G4_BTHC63800AYB480MGN 457862 true } {sdf [/dev/disk/by-id/ata-INTEL_SSDSC2BX480G4_BTHC63800AXK480MGN /dev/disk/by-id/wwn-0x55cd2e404c356c6a /dev/disk/by-path/pci-0000:03:00.0-sas-exp0x500605b0000272bf-phy5-lun-0] ata-INTEL_SSDSC2BX480G4_BTHC63800AXK480MGN 457862 false }]

All disks are marked as 'false', except sde. On this machine sde is the only used disk; it's partitioned and sde1 is mounted as /. Later on:

0bfa847e-cb1e-4576-8ff0-8ecb006b3369: unit-osd1-0 2017-03-31 01:50:20 DEBUG juju.worker.uniter.remotestate watcher.go:379 got storage change: [osd-devices/0 osd-journals/1] ok=true
0bfa847e-cb1e-4576-8ff0-8ecb006b3369: unit-osd1-0 2017-03-31 01:50:20 DEBUG juju.worker.uniter.storage resolver.go:125 next hook op for storage-osd-devices-0: {Kind:0 Life:alive Attached:false Location:}
0bfa847e-cb1e-4576-8ff0-8ecb006b3369: unit-osd1-0 2017-03-31 01:50:20 DEBUG juju.worker.uniter.storage resolver.go:125 next hook op for storage-osd-journals-1: {Kind:1 Life:alive Attached:true Location:/dev/disk/by-dname/sde}

At that point ceph-osd jujud wasn't coming up. It was stuck in 'agent initializing' state. After 5min, I decided to restart jujud-unit-osd1-0. After that agent state is failed and logs are populating with:

2017-03-31 02:06:14 DEBUG juju.worker.uniter.storage resolver.go:125 next hook op for storage-osd-devices-0: {Kind:0 Life:alive Attached:false Location:}
2017-03-31 02:06:14 DEBUG juju.worker.uniter.storage resolver.go:125 next hook op for storage-osd-journals-1: {Kind:1 Life:alive Attached:true Location:/dev/disk/by-dname/sde}
2017-03-31 02:06:14 DEBUG juju.worker.uniter.operation executor.go:69 running operation run storage-attached (osd-journals/1) hook
2017-03-31 02:06:14 DEBUG juju.worker.uniter uniter.go:558 acquire lock "machine-lock" for uniter hook execution
2017-03-31 02:06:14 DEBUG juju.worker.uniter uniter.go:563 lock "machine-lock" acquired
2017-03-31 02:06:14 DEBUG juju.worker.uniter.operation executor.go:100 pre...

Read more...

Revision history for this message
Ante Karamatić (ivoks) wrote :
Download full text (12.3 KiB)

Looking at the logs, this is communication between juju and MAAS:

3976a927-4567-434c-8700-e8441b62f884: machine-0 2017-04-04 06:11:48 TRACE maas controller.go:733 request a87: POST http://172.16.7.2/MAAS/api/2.0/machines/?op=allocate, params=agent_name=7877e612-f8a7-487d-8fc9-0738b87d4940&storage=root%3A0%2C3%3A1%28rotary%29%2C4%3A1%28ssd%29%2C5%3A1%28ssd%29&zone=default

And MAAS's responded with:

3976a927-4567-434c-8700-e8441b62f884: machine-0 2017-04-04 06:11:49 TRACE maas controller.go:741 response a87: {
    "interface_set": [
        {
            "system_id": "kqmche",
...
    "physicalblockdevice_set": [
        {
            "available_size": 1000204886016,
            "serial": "9XG9N8F7",
            "system_id": "kqmche",
            "resource_uri": "/MAAS/api/2.0/nodes/kqmche/blockdevices/38/",
            "model": "ST91000640NS",
            "uuid": null,
            "name": "sdf",
            "tags": [
                "rotary",
                "sata",
                "7200rpm"
            ],
            "type": "physical",
            "partitions": [],
            "used_for": "Unused",
            "filesystem": null,
            "id": 38,
            "id_path": "/dev/disk/by-id/wwn-0x5000c50087dc5d29",
            "path": "/dev/disk/by-dname/sdf",
            "used_size": 0,
            "partition_table_type": null,
            "size": 1000204886016,
            "block_size": 4096
        },
        {
            "available_size": 1000204886016,
            "serial": "9XG9MNF8",
            "system_id": "kqmche",
            "resource_uri": "/MAAS/api/2.0/nodes/kqmche/blockdevices/39/",
            "model": "ST91000640NS",
            "uuid": null,
            "name": "sdd",
            "tags": [
                "rotary",
                "sata",
                "7200rpm"
            ],
            "type": "physical",
            "partitions": [],
            "used_for": "Unused",
            "filesystem": null,
            "id": 39,
            "id_path": "/dev/disk/by-id/wwn-0x5000c50087dcc191",
            "path": "/dev/disk/by-dname/sdd",
            "used_size": 0,
            "partition_table_type": null,
            "size": 1000204886016,
            "block_size": 4096
        },
        {
            "available_size": 480103981056,
            "serial": "BTHC638106R7480MGN",
            "system_id": "kqmche",
            "resource_uri": "/MAAS/api/2.0/nodes/kqmche/blockdevices/40/",
            "model": "INTEL SSDSC2BX48",
            "uuid": null,
            "name": "sdb",
            "tags": [
                "ssd",
                "sata"
            ],
            "type": "physical",
            "partitions": [],
            "used_for": "Unused",
            "filesystem": null,
            "id": 40,
            "id_path": "/dev/disk/by-id/wwn-0x55cd2e404c35a551",
            "path": "/dev/disk/by-dname/sdb",
            "used_size": 0,
            "partition_table_type": null,
            "size": 480103981056,
            "block_size": 4096
        },
        {
            "available_size": 1000204886016,
            "serial": "9XG9N5LS",
            "system_id": "kqmche",
            "resource_uri": "/MAAS/api/2.0...

Changed in maas:
status: Incomplete → New
Revision history for this message
Andrew Wilkins (axwalk) wrote :
Download full text (4.5 KiB)

Block devices got truncated. Here's block device 43, which holds the root fs:

{
    "available_size": 0,
    "serial": "BTHC63800AL0480MGN",
    "system_id": "kqmche",
    "resource_uri": "/MAAS/api/2.0/nodes/kqmche/blockdevices/43/",
    "model": "INTEL SSDSC2BX48",
    "uuid": null,
    "name": "sda",
    "tags": [
        "ssd",
        "sata"
    ],
    "type": "physical",
    "partitions": [
        {
            "filesystem": {
                "fstype": "ext4",
                "uuid": "be843551-9697-4944-b96d-e22787680b86",
                "mount_options": "",
                "mount_point": "/",
                "label": ""
            },
            "bootable": false,
            "device_id": 43,
            "system_id": "kqmche",
            "id": 24,
            "uuid": "ffe4f3b3-27be-4d50-a7a1-b10fac8e4839",
            "resource_uri": "/MAAS/api/2.0/nodes/kqmche/blockdevices/43/partition/24",
            "size": 480096813056,
            "type": "partition",
            "path": "/dev/disk/by-dname/sda-part1",
            "used_for": "ext4 formatted filesystem mounted at /"
        }
    ],
    "used_for": "MBR partitioned with 1 partition",
    "filesystem": null, ...

Read more...

Revision history for this message
Andres Rodriguez (andreserl) wrote :

@Ante,

Can you please provide the output of:

1. output of curtin config: maas machine get-curtin-config <system_id>
2. Commissioning script for storage discovery.
3. Output of the following in the deployed machine that failed your juju request:

sudo lsblk --exclude 1,2,7 -d -P -o NAME,RO,RM,MODEL,ROTA,MAJ:MIN,SERIAL -x MAJ:MIN

Changed in maas:
status: New → Incomplete
Revision history for this message
Ante Karamatić (ivoks) wrote :
Changed in maas:
status: Incomplete → New
Revision history for this message
Ante Karamatić (ivoks) wrote :

Keep in mind that this is new deploymed, so IDs/disks might have changed.

Revision history for this message
Blake Rouse (blake-rouse) wrote :

Ante,

You mention contraints_by_type in your other comment but you did not provide the output. Can you provide the entire JSON response from MAAS during the allocation. That way I can check to see if MAAS is giving a correct response to Juju.

Thanks,
Blake

Changed in maas:
importance: Undecided → Critical
milestone: none → 2.2.0rc2
status: New → Confirmed
Revision history for this message
Ante Karamatić (ivoks) wrote :

Blake, it's in comment 6. Here's another output from 'maas maas machines allocate storage="root:0,2:1(ssd),3:1(sata)"'

It says that '/' is on id 8, but it's actually on id 11.

Chris Gregan (cgregan)
tags: added: stable-blocker
tags: added: cdo-qa-blocker
Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

@andreserl,

Could you check if LP: #1677001 is related to this bug?

Thanks.

Changed in maas:
assignee: nobody → Blake Rouse (blake-rouse)
status: Confirmed → In Progress
Changed in maas:
status: In Progress → Fix Committed
Revision history for this message
Anastasia (anastasia-macmood) wrote :

Marking as Invalid for juju - MAAS bug only.

Changed in juju:
status: Incomplete → Invalid
Revision history for this message
Andres Rodriguez (andreserl) wrote : Re: [Bug 1674148] Re: hook for storage: storage already attached - Juju using already mounted disk

@ante,

Looking at the output, the result seems correct. You have requested a disk
root,0,1. Root is 199 where both the mbr and / are at. If you didn't
specify "root" Juju has done it itself provided that the first disk is
always the root disk.

You have requested a disk
On Sat, Apr 15, 2017 at 2:25 PM Ante Karamatić <email address hidden>
wrote:

> With MAAS 2.2.0~rc2+bzr5983, I still get:
>
> ...
> "used_for": "MBR partitioned with 1 partition",
> "block_size": 4096,
> "model": "INTEL SSDSC2BX48",
> "path": "/dev/disk/by-dname/sde",
> "partition_table_type": "MBR",
> "id": 199,
> "used_size": 480102055936,
> "id_path": "/dev/disk/by-id/wwn-0x55cd2e404c356cc7",
> "size": 480103981056,
> "resource_uri": "/MAAS/api/2.0/nodes/dx68b6/blockdevices/199/",
> "uuid": null,
> "partitions": [
> {
> "system_id": "dx68b6",
> "filesystem": {
> "fstype": "ext4",
> "mount_point": "/",
> "uuid": "c348c091-fe4f-4766-80ff-3d5da3278031",
> "label": "",
> "mount_options": ""
> },
> "used_for": "ext4 formatted filesystem mounted at /",
> "resource_uri":
> "/MAAS/api/2.0/nodes/dx68b6/blockdevices/199/partition/45",
> "bootable": false,
> "path": "/dev/disk/by-dname/sde-part1",
> "size": 480096813056,
> "uuid": "0a762d55-5c87-46df-af76-a6417fb76041",
> "id": 45,
> "type": "partition",
> "device_id": 199
> }
> ],
> "available_size": 0,
> "system_id": "dx68b6",
> "name": "sde",
> "filesystem": null,
> "type": "physical",
> "serial": "BTHC63800AYL480MGN"
> },
>
> ...
> "constraints_by_type": {
> "storage": {
> "0": [
> 195
> ],
> "1": [
> 200
> ],
> "root": [
> 199
> ]
> }
> },
>
>
> Does one need to recommission the nodes?
>
> --
> You received this bug notification because you are subscribed to MAAS.
> https://bugs.launchpad.net/bugs/1674148
>
> Title:
> hook for storage: storage already attached - Juju using already
> mounted disk
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1674148/+subscriptions
>
--
Andres Rodriguez (RoAkSoAx)
Ubuntu Server Developer
MSc. Telecom & Networking
Systems Engineer

Revision history for this message
Ante Karamatić (ivoks) wrote :
Download full text (3.7 KiB)

Yes. Unfortunately, one can't prevent LP from sending an email or cancel
the comment. As soon as I commented, I've realized that MAAS is doing the
right thing. I've hid the comment.

So, now MAAS is doing the right thing, but juju's agent still doesn't
start. I'll do more testing and provide debugging data tomorrow.

pon, 17. tra 2017. 15:30 Andres Rodriguez <email address hidden> je
napisao:

> @ante,
>
> Looking at the output, the result seems correct. You have requested a disk
> root,0,1. Root is 199 where both the mbr and / are at. If you didn't
> specify "root" Juju has done it itself provided that the first disk is
> always the root disk.
>
> You have requested a disk
> On Sat, Apr 15, 2017 at 2:25 PM Ante Karamatić <
> <email address hidden>>
> wrote:
>
> > With MAAS 2.2.0~rc2+bzr5983, I still get:
> >
> > ...
> > "used_for": "MBR partitioned with 1 partition",
> > "block_size": 4096,
> > "model": "INTEL SSDSC2BX48",
> > "path": "/dev/disk/by-dname/sde",
> > "partition_table_type": "MBR",
> > "id": 199,
> > "used_size": 480102055936,
> > "id_path": "/dev/disk/by-id/wwn-0x55cd2e404c356cc7",
> > "size": 480103981056,
> > "resource_uri":
> "/MAAS/api/2.0/nodes/dx68b6/blockdevices/199/",
> > "uuid": null,
> > "partitions": [
> > {
> > "system_id": "dx68b6",
> > "filesystem": {
> > "fstype": "ext4",
> > "mount_point": "/",
> > "uuid": "c348c091-fe4f-4766-80ff-3d5da3278031",
> > "label": "",
> > "mount_options": ""
> > },
> > "used_for": "ext4 formatted filesystem mounted at /",
> > "resource_uri":
> > "/MAAS/api/2.0/nodes/dx68b6/blockdevices/199/partition/45",
> > "bootable": false,
> > "path": "/dev/disk/by-dname/sde-part1",
> > "size": 480096813056,
> > "uuid": "0a762d55-5c87-46df-af76-a6417fb76041",
> > "id": 45,
> > "type": "partition",
> > "device_id": 199
> > }
> > ],
> > "available_size": 0,
> > "system_id": "dx68b6",
> > "name": "sde",
> > "filesystem": null,
> > "type": "physical",
> > "serial": "BTHC63800AYL480MGN"
> > },
> >
> > ...
> > "constraints_by_type": {
> > "storage": {
> > "0": [
> > 195
> > ],
> > "1": [
> > 200
> > ],
> > "root": [
> > 199
> > ]
> > }
> > },
> >
> >
> > Does one need to recommission the nodes?
> >
> > --
> > You received this bug notification because you are subscribed to MAAS.
> > https://bugs.launchpad.net/bugs/1674148
> >
> > Title:
> > hook for storage: storage already attached - Juju using already
> > mounted disk
> >
> > To ma...

Read more...

Changed in maas:
status: Fix Committed → Fix Released
Revision history for this message
Ante Karamatić (ivoks) wrote :

For juju see bug 1677001

Changed in juju:
status: Invalid → Confirmed
Revision history for this message
Andrew Wilkins (axwalk) wrote :

Andres, I'm not sure I understand your reply in #16. It seems you think that we're not getting a disk back, but that's not the case. The issue is that they're not being mapped from labels as expected.

What we expect is that the first constraint in the list is mapped to the root filesystem's block device. When I looked at the MAAS code it appeared that there's code to do that. But from the output, you can see that the ID in constraints_by_type for "root" does not match the ID for the "/" block device in physicalblockdevice_set.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

(But now I see this is marked Fix Committed... was there a bug after all?)

Revision history for this message
Tim Penhey (thumper) wrote :

Ante, can you confirm if this is still an issue for Juju?

Changed in juju:
status: Confirmed → Incomplete
Revision history for this message
Ante Karamatić (ivoks) wrote :

I can confirm this is fixed in both MAAS 2.2rcX and juju 2.2b4.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Turns out this is not completely fixed, I can reproduce it on 2.2.1 with a custom storage charm. The issue only occurs with storage with a hyphen in the name.

Changed in juju:
status: Incomplete → Triaged
Revision history for this message
Andrew Wilkins (axwalk) wrote :
Revision history for this message
Andrew Wilkins (axwalk) wrote :
Changed in juju:
assignee: nobody → Andrew Wilkins (axwalk)
milestone: none → 2.3-alpha1
Changed in juju:
milestone: 2.3-beta1 → 2.3-beta2
Revision history for this message
Anastasia (anastasia-macmood) wrote :

I suspect that the fix for this has been already ported into 2.3 (develop) as part of one of the larger merges we have undertaken recently.

Changed in juju:
status: Triaged → Fix Committed
Changed in juju:
milestone: 2.3-beta2 → 2.3-beta1
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.