Node cleaning fails if the _ata_erase fails

Bug #1536695 reported by Om Kumar
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ironic-python-agent
Fix Released
High
Julia Kreger

Bug Description

As per understanding, node cleaning should fall back to dd if the the _ata_erase fails.

| maintenance_reason | Agent returned error for clean step {u'priority': 10, u'interface': |
| | u'deploy', u'step': u'erase_devices', u'abortable': True, |
| | u'reboot_requested': False} on node e3143045-09cf-4892-a699-1e26884e54ae |
| | : {u'message': u'Clean step failed: Error performing clean_step |
| | erase_devices: Error erasing block device: Block device /dev/sda is |
| | frozen and cannot be erased', u'code': 500, u'type': u'CleaningError', |
| | u'details': u'Error performing clean_step erase_devices: Error erasing |
| | block device: Block device /dev/sda is frozen and cannot be erased'}. |
| provision_state | clean failed

| e3143045-09cf-4892-a699-1e26884e54ae | None | None | power off | clean failed | True |

stack@test-blade:~$ sudo hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
        Model Number: XR0480GEBLV
        Serial Number: 14120D0FA6F5
        Firmware Revision: HPS4
        Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
        Used: unknown (minor revision code 0x0028)
        Supported: 9 8 7 6 5
        Likely used: 9
Configuration:
        Logical max current
        cylinders 16383 16383
        heads 16 16
        sectors/track 63 63
        --
        CHS current addressable sectors: 16514064
        LBA user addressable sectors: 268435455
        LBA48 user addressable sectors: 937703088
        Logical Sector size: 512 bytes
        Physical Sector size: 4096 bytes
        Logical Sector-0 offset: 0 bytes
        device size with M = 1024*1024: 457862 MBytes
        device size with M = 1000*1000: 480103 MBytes (480 GB)
        cache/buffer size = unknown
        Form Factor: less than 1.8 inch
        Nominal Media Rotation Rate: Solid State Device
Capabilities:
        LBA, IORDY(can be disabled)
        Queue depth: 32
        Standby timer values: spec'd by Standard, with device specific minimum
        R/W multiple sector transfer: Max = 16 Current = 16
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4
             Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           * SMART feature set
                Security Mode feature set
           * Power Management feature set
                Write cache
           * Look-ahead
                DEVICE_RESET command
           * WRITE_BUFFER command
           * READ_BUFFER command
           * NOP cmd
           * DOWNLOAD_MICROCODE
           * 48-bit Address feature set
           * Device Configuration Overlay feature set
           * Mandatory FLUSH_CACHE
           * FLUSH_CACHE_EXT
           * SMART error logging
           * SMART self-test
           * General Purpose Logging feature set
           * 64-bit World wide name
           * IDLE_IMMEDIATE with UNLOAD
           * WRITE_UNCORRECTABLE_EXT command
           * {READ,WRITE}_DMA_EXT_GPL commands
           * Segmented DOWNLOAD_MICROCODE
           * Gen1 signaling speed (1.5Gb/s)
           * Gen2 signaling speed (3.0Gb/s)
           * Gen3 signaling speed (6.0Gb/s)
           * Native Command Queueing (NCQ)
           * Phy event counters
           * Device automatic Partial to Slumber transitions
           * READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
           * DMA Setup Auto-Activate optimization
                Device-initiated interface power management
                Asynchronous notification (eg. media change)
           * Software settings preservation
           * SMART Command Transport (SCT) feature set
           * SCT Write Same (AC2)
           * SCT Features Control (AC4)
           * SCT Data Tables (AC5)
           * reserved 69[4]
           * Data Set Management TRIM supported (limit 8 blocks)
           * Deterministic read ZEROs after TRIM
Security:
        Master password revision code = 65534
                supported
        not enabled
        not locked
                frozen
        not expired: security count
                supported: enhanced erase
        2min for SECURITY ERASE UNIT. 2min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 500a07510d0fa6f5
        NAA : 5
        IEEE OUI : 00a075
        Unique ID : 10d0fa6f5
Checksum: correct

Om Kumar (om-kumar)
description: updated
Changed in ironic-python-agent:
assignee: nobody → Julia Kreger (juliaashleykreger)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic-python-agent (master)

Fix proposed to branch: master
Review: https://review.openstack.org/270902

Changed in ironic-python-agent:
status: New → In Progress
Dmitry Tantsur (divius)
Changed in ironic-python-agent:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic-python-agent (master)

Reviewed: https://review.openstack.org/270902
Committed: https://git.openstack.org/cgit/openstack/ironic-python-agent/commit/?id=ed74a062c19a63a2c05a506c4ed8d3aa4ecfa09e
Submitter: Jenkins
Branch: master

commit ed74a062c19a63a2c05a506c4ed8d3aa4ecfa09e
Author: Julia Kreger <email address hidden>
Date: Thu Jan 21 11:26:44 2016 -0500

    Provide fallback from ATA erase to shredding

    Presently should the ATA erasure operation fails, IPA halts the
    cleaning process and the node goes to CLEANFAIL state as a result.

    This failure could be the result of a previous cleaning failure
    that left drive security enabled, for which code has been added
    in an attempt to address this case by attempting to unlock the
    the drive.

    In the event that an operator wishes to automatically fallback to
    disk scrubbing operations, the capability has been added through
    a driver_internal_info field "agent_continue_if_ata_erase_failed"
    that can be set to True, however defaults to False keeping the
    same behavior that IPA presently exhibits in the event of ATA
    erase operations failing.

    Partial-Bug: #1536695
    Change-Id: I88edd9477f4f05aa55b2fe8efa4bbff1c5573bb1

Revision history for this message
Dmitry Tantsur (divius) wrote :

Hi all! Is there anything else to do about this bug?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic-python-agent (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/323723

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic-python-agent (stable/mitaka)

Reviewed: https://review.openstack.org/323723
Committed: https://git.openstack.org/cgit/openstack/ironic-python-agent/commit/?id=b6bdaf040f6cde7f069ea355b7b4b12aea60fbd7
Submitter: Jenkins
Branch: stable/mitaka

commit b6bdaf040f6cde7f069ea355b7b4b12aea60fbd7
Author: Julia Kreger <email address hidden>
Date: Thu Jan 21 11:26:44 2016 -0500

    Provide fallback from ATA erase to shredding

    Presently should the ATA erasure operation fails, IPA halts the
    cleaning process and the node goes to CLEANFAIL state as a result.

    This failure could be the result of a previous cleaning failure
    that left drive security enabled, for which code has been added
    in an attempt to address this case by attempting to unlock the
    the drive.

    In the event that an operator wishes to automatically fallback to
    disk scrubbing operations, the capability has been added through
    a driver_internal_info field "agent_continue_if_ata_erase_failed"
    that can be set to True, however defaults to False keeping the
    same behavior that IPA presently exhibits in the event of ATA
    erase operations failing.

    Partial-Bug: #1536695
    Change-Id: I88edd9477f4f05aa55b2fe8efa4bbff1c5573bb1
    (cherry picked from commit ed74a062c19a63a2c05a506c4ed8d3aa4ecfa09e)

tags: added: in-stable-mitaka
Revision history for this message
Julia Kreger (juliaashleykreger) wrote :

Hi Dmitry, there is a conductor side patch to enable the capability to fallback to shredding, but it has not yet landed. https://review.openstack.org/#/c/302819/

Revision history for this message
Jay Faulkner (jason-oldos) wrote :

Looks like that patch landed, thanks Julia!

Changed in ironic-python-agent:
status: In Progress → Fix Released
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/ironic 6.0.0

This issue was fixed in the openstack/ironic 6.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.