Nodes cannot boot after a storage disk replacement

Bug #1488594 reported by james beedy on 2015-08-25
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Undecided
Unassigned
syslinux (Ubuntu)
Undecided
Unassigned

Bug Description

I'm experiencing this issue when I replace any osd disk on any ceph storage node and then reboot it. Immediatly after the node pxe boots, the node will hang at a "booting local disk" message and fails to timeout or boot. A work-around I've found to get a node to boot after a storage disk replacement is to momentarilly disable maas from managing the network after the power on of a node who's disk has been replaced; following that, after the node pxe boot times out and it results to booting from local disk into the os, I re-enable maas management on that network so the node gets an ip and continues the boot process and eventually successfully boots.

It would be nice to get some feedback on what is going on here, and also a best practice for what/how to proceed in the case when you need to swap storage disks.

Thanks!

maas.log <-- http://paste.ubuntu.com/12193844/

clusterd.log <-- http://paste.ubuntu.com/12193842/

maas - 1.8.0+bzr4001-0ubuntu2~trusty1
trusty - 14.04.3

james beedy (jamesbeedy) wrote :

Here is a shot of the console of a node experiencing the issue.

description: updated
Blake Rouse (blake-rouse) wrote :

Do it just sit at the console prompt? Or does an error appear?

Looks like that the BIOS or PXELINUX for that matter might enumerate the block devices in a different order and the first disk is no longer the boot disk.

Changed in maas:
status: New → Incomplete
james beedy (jamesbeedy) wrote :

It just sits.....I let her sit overnight even....no timeout....nothing.

Changed in maas:
status: Incomplete → Confirmed
milestone: none → next
james beedy (jamesbeedy) wrote :

Update:

After a few reboots and swapping back and fourth of storage disks....the node I'm experimenting on now neglects to boot with the original disk too.

james beedy (jamesbeedy) wrote :

sp *forth

james beedy (jamesbeedy) wrote :

From what I can gather... this issue seems to exists because of stale entries in maasserver_physicalblockdevice, and/or stale entries in maasserver_blockdevice which are inconsistent with the current resources/state of the node.

Might I enquire if/where maas might verify node resources upon power on?

Andres Rodriguez (andreserl) wrote :

Hi James,

Have you tried re-commissioning your node. A recommissioning should update the storage model., MAAS does not yet provide the ability to update the information about disks/NIC's of currently deployed devices, however, if you were to re-commission and re-deploy this would potentially be fixed.

Blake Rouse (blake-rouse) wrote :

MAAS does not affect the boot process at all. It just tells PXELINUX to boot from the first disk, MAAS does not identify which disk is the first disk, this is done by the BIOS at boot time.

james beedy (jamesbeedy) wrote :

Andres -

Yeah....a re-commissioning will solve the issue....to the extent that I could essentially get my node back and re-deploy ceph-osd and nova-compute IF juju would properly destroy the associated services and machine....but unfortunately no amount or combination of {service, unit, machine}-destroy commands will get rid of the unit, services or machine (see http://paste.ubuntu.com/12194988/).

This is all beside the point that I only need to replace a single disk. It is a far greater task to evacuate the host, re-commission, and redeploy and configure all services, when essentially all I should need to do is swap a disk and run a series of < 5 ceph commands to be back up from a disk failure.

Blake - The disks position in the bios and on the hba card do not change.

james beedy (jamesbeedy) wrote :

How might this functionality be implemented? Possibly a resource diff upon poweron; following that, some kind of conditional/partial commissioning so a node's resources could be current? Should I feature request for this? Hmmmm, I have a feeling I'm barking up the wrong tree...per^^, but I can't seem to make sense of this any other way.

Thanks

james beedy (jamesbeedy) wrote :

Update

I was able to bring my juju env current and finally delete the machine and services from the environment with a combination of "juju resolved <unit>" and "juju destroy-{unit, machine, service} --force <service,machine,unit>

Gavin Panella (allenap) wrote :

Even when a node has been deployed, the node still attempts to PXE boot
from MAAS each time it's rebooted. MAAS knows it should boot locally and
gives the following configuration to PXELINUX:

  DEFAULT local

  LABEL local
    LOCALBOOT 0

It appears that this does not do the right thing for your hardware. Put
another way, it does not do the same thing as your machine's BIOS does
when the network is unavailable.

I suspect this is a bug in PXELINUX and/or your hardare. There may be
something that MAAS can do to help, but I don't think it's the cause, so
I'll target this bug at PXELINUX and mark it Invalid in MAAS for now.

Changed in maas:
status: Confirmed → Invalid

I really appreciate the input everyone. I guess I was a little overwhelmed
dealing with a few different issues at once .... I didn't mean to place the
blame on MAAS. That being said, node disk replacement under the direction
of MAAS is still a rugged process for me. I understand that pxelinux/bios
may be the root cause of my issue ... I guess I feel like MAAS had more to
do with this due to MAAS not being able to recognize new disk after
replacement w/o recommissioning. I feel like despite the boot issue, I
would still need to recommission and down the node for MAAS to take
inventory of the new disk after a replacement. Is this being looked into
for 1.9?

Thanks again,

James

On Mon, Jan 25, 2016 at 6:41 AM, Gavin Panella <email address hidden>
wrote:

> Even when a node has been deployed, the node still attempts to PXE boot
> from MAAS each time it's rebooted. MAAS knows it should boot locally and
> gives the following configuration to PXELINUX:
>
> DEFAULT local
>
> LABEL local
> LOCALBOOT 0
>
> It appears that this does not do the right thing for your hardware. Put
> another way, it does not do the same thing as your machine's BIOS does
> when the network is unavailable.
>
> I suspect this is a bug in PXELINUX and/or your hardare. There may be
> something that MAAS can do to help, but I don't think it's the cause, so
> I'll target this bug at PXELINUX and mark it Invalid in MAAS for now.
>
>
> ** Also affects: syslinux (Ubuntu)
> Importance: Undecided
> Status: New
>
> ** Changed in: maas
> Status: Confirmed => Invalid
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1488594
>
> Title:
> Nodes cannot boot after a storage disk replacement
>
> Status in MAAS:
> Invalid
> Status in syslinux package in Ubuntu:
> New
>
> Bug description:
> I'm experiencing this issue when I replace any osd disk on any ceph
> storage node and then reboot it. Immediatly after the node pxe boots,
> the node will hang at a "booting local disk" message and fails to
> timeout or boot. A work-around I've found to get a node to boot after
> a storage disk replacement is to momentarilly disable maas from
> managing the network after the power on of a node who's disk has been
> replaced; following that, after the node pxe boot times out and it
> results to booting from local disk into the os, I re-enable maas
> management on that network so the node gets an ip and continues the
> boot process and eventually successfully boots.
>
> It would be nice to get some feedback on what is going on here, and
> also a best practice for what/how to proceed in the case when you need
> to swap storage disks.
>
> Thanks!
>
> maas.log <-- http://paste.ubuntu.com/12193844/
>
> clusterd.log <-- http://paste.ubuntu.com/12193842/
>
> maas - 1.8.0+bzr4001-0ubuntu2~trusty1
> trusty - 14.04.3
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1488594/+subscriptions
>

james beedy (jamesbeedy) wrote :

I really appreciate the input everyone. I guess I was a little overwhelmed dealing with a few different issues at once .... I didn't mean to place the blame on MAAS. That being said, node disk replacement under the direction of MAAS is still a rugged process for me. I understand that pxelinux/bios may be the root cause of my issue ... I guess I feel like MAAS had more to do with this due to MAAS not being able to recognize new disk after replacement w/o recommissioning. I feel like despite the boot issue, I would still need to recommission and down the node for MAAS to take inventory of the new disk after a replacement. Is this being looked into for 1.9?

Thanks again,

James

Blake Rouse (blake-rouse) wrote :

If you know which is the old disk and you have the fully information for the new disk using the API you could update that disk with all the new disk information. You would need to be very sure about the data or the deployment would fail, that is why its recommended to re-commission.

maas my-maas-session block-device update 1 model= serial= size= block_size=

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments