Comment 14 for bug 1488594

Revision history for this message
james beedy (jamesbeedy) wrote : Re: [Bug 1488594] Re: Nodes cannot boot after a storage disk replacement

I really appreciate the input everyone. I guess I was a little overwhelmed
dealing with a few different issues at once .... I didn't mean to place the
blame on MAAS. That being said, node disk replacement under the direction
of MAAS is still a rugged process for me. I understand that pxelinux/bios
may be the root cause of my issue ... I guess I feel like MAAS had more to
do with this due to MAAS not being able to recognize new disk after
replacement w/o recommissioning. I feel like despite the boot issue, I
would still need to recommission and down the node for MAAS to take
inventory of the new disk after a replacement. Is this being looked into
for 1.9?

Thanks again,

James

On Mon, Jan 25, 2016 at 6:41 AM, Gavin Panella <email address hidden>
wrote:

> Even when a node has been deployed, the node still attempts to PXE boot
> from MAAS each time it's rebooted. MAAS knows it should boot locally and
> gives the following configuration to PXELINUX:
>
> DEFAULT local
>
> LABEL local
> LOCALBOOT 0
>
> It appears that this does not do the right thing for your hardware. Put
> another way, it does not do the same thing as your machine's BIOS does
> when the network is unavailable.
>
> I suspect this is a bug in PXELINUX and/or your hardare. There may be
> something that MAAS can do to help, but I don't think it's the cause, so
> I'll target this bug at PXELINUX and mark it Invalid in MAAS for now.
>
>
> ** Also affects: syslinux (Ubuntu)
> Importance: Undecided
> Status: New
>
> ** Changed in: maas
> Status: Confirmed => Invalid
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1488594
>
> Title:
> Nodes cannot boot after a storage disk replacement
>
> Status in MAAS:
> Invalid
> Status in syslinux package in Ubuntu:
> New
>
> Bug description:
> I'm experiencing this issue when I replace any osd disk on any ceph
> storage node and then reboot it. Immediatly after the node pxe boots,
> the node will hang at a "booting local disk" message and fails to
> timeout or boot. A work-around I've found to get a node to boot after
> a storage disk replacement is to momentarilly disable maas from
> managing the network after the power on of a node who's disk has been
> replaced; following that, after the node pxe boot times out and it
> results to booting from local disk into the os, I re-enable maas
> management on that network so the node gets an ip and continues the
> boot process and eventually successfully boots.
>
> It would be nice to get some feedback on what is going on here, and
> also a best practice for what/how to proceed in the case when you need
> to swap storage disks.
>
> Thanks!
>
> maas.log <-- http://paste.ubuntu.com/12193844/
>
> clusterd.log <-- http://paste.ubuntu.com/12193842/
>
> maas - 1.8.0+bzr4001-0ubuntu2~trusty1
> trusty - 14.04.3
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1488594/+subscriptions
>