erase_node MCollective agent sometimes fails to delete MBR

Bug #1437511 reported by Ryan Moe
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Alex Schultz

Bug Description

It seems not all calls to erase_data will be run. Once the OS drive is wiped none of the debug messages from the forked process end up in the response and it seems no subsequent drives will be wiped.

In my case there are two drives, sda and nvme0n1. sda contains the OS and nvme0n1 contains the image volume group. When the node is deleted sda is wiped correctly but nvme0n1 is not.

After deletion:
[root@bootstrap ~]# parted /dev/nvme0n1 print
Model: Unknown (unknown)
Disk /dev/nvme0n1: 400GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
 1 17.4kB 25.2MB 25.1MB primary bios_grub
 2 25.2MB 235MB 210MB primary boot
 3 235MB 445MB 210MB ext2 primary boot
 4 445MB 400GB 399GB primary lvm

If I modify the agent to exclude the device on which the OS resides then nvme0n1 is wiped correctly and all debug messages are logged.

I'm not exactly sure what's happening as it's hard to debug this with no log messages and the OS drive getting wiped. It seems like the problem is that once the OS drive is gone subsequent system() calls will fail as /bin/sh won't exist.

Ryan Moe (rmoe)
description: updated
Ryan Moe (rmoe)
Changed in fuel:
assignee: nobody → Fuel Astute Team (fuel-astute)
importance: Undecided → High
milestone: none → 6.1
tags: added: module-astute
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Fuel Astute Team (fuel-astute) → Vladimir Sharshov (vsharshov)
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
status: New → Confirmed
Dmitry Pyzhov (dpyzhov)
tags: added: tricky
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

I suppose we need storage code: https://github.com/stackforge/fuel-astute/blob/master/mcagents/erase_node.rb#L27

Can you provide output for this device:

    udevadm info --query=property --path /dev/nvme0n1

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Ryan Moe (rmoe) wrote :

I don't have access to this hardware anymore but I can tell you on 2.6 kernels those drives have a major of 259 (on 3.x the major is one of the ones we already have in the list but I don't remember which one). I don't think it's specific to nvme hardware though. When I excluded the OS drive the nvme drive was wiped correctly (I modified the agent to do this). I haven't tested it yet but I suspect this could be reproduced without nvme drives.

Changed in fuel:
assignee: Vladimir Sharshov (vsharshov) → Alex Schultz (alex-schultz)
Revision history for this message
Ryan Moe (rmoe) wrote :

I wasn't able to reproduce this with VMs and I don't have access to the hardware where I originally found this issue.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (master)

Fix proposed to branch: master
Review: https://review.openstack.org/179174

Changed in fuel:
status: Incomplete → In Progress
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Alex, please make sure the fix is merged before EOD today.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/179174
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=53aee4a49102a14213631148ce861515fefb3f03
Submitter: Jenkins
Branch: master

commit 53aee4a49102a14213631148ce861515fefb3f03
Author: Alex Schultz <email address hidden>
Date: Tue Apr 28 09:09:30 2015 -0500

    Erase non-root disks first

    This change adjusts the order that the system block devices (disks)
    are erased as part erase_node MCollective action. This change queries
    the block devices for the system and uses lsblk to determine if the
    device has a partition that is currently being used for mounting /.
    The disks that are not used for / are erased first, followed by
    the root disks. This done to ensure that data disks are properly
    cleaned up and are not accidentally left intact on the chance that
    purging the root disk causes the erase script to fail.

    Change-Id: I17e0e88224ef781b666b100f80904e5abfecd023
    Closes-Bug: 1437511

Changed in fuel:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.