Ironic python agent cleaning fails from CRC mismatch
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ironic-python-agent |
Fix Released
|
High
|
Unassigned |
Bug Description
During node cleaning, the generic hardware manager can fail in the `erasing device metadata` step if the GPT is invalid. Specifically this can happen when the hardware manager calls ```sgdisk -Z /dev/somedrive``` to destroy the GPT and MBR data structures.
It isn't clear why sgdisk is validating the GPT when the -Z flag (zap all) instructs sgdisk to destroy the GPT. However, upon retrying sgdisk -Z succeeds.
Example failure:
2017-12-11 12:14:47.449 7 ERROR ironic.
u'step': u'erase_
to erase the metadata on the device(s): "/dev/nvme3n1": Unexpected error while running command.\nCommand: sgdisk -Z /dev/nvme3n1\nExit code: 2\nStdout: u"Caution! After loading partitions, the CRC doesn\'
t check out!\\nGPT data structures destroyed! You may now partition the disk using fdisk or\\nother utilities.
der\\nfrom backup!
disk!\\n\\nInvalid partition data!\\n"', u'code': 500, u'type': u'CleaningError', u'details': u'Error performing clean_step erase_devices_
a on the device(s): "/dev/nvme3n1": Unexpected error while running command.\nCommand: sgdisk -Z /dev/nvme3n1\nExit code: 2\nStdout: u"Caution! After loading partitions, the CRC doesn\'t check out!\\nGPT d
ata structures destroyed! You may now partition the disk using fdisk or\\nother utilities.
n\\n\\x07Warning! Main partition table CRC mismatch! Loaded backup partition table\\ninstead of main partition table!\
partition data!\\n"'}.
Workaroud:
Retry the cleaning. For example, move the node to the `manage` state, and then to `provide`.
description: | updated |
description: | updated |
description: | updated |
Changed in ironic-python-agent: | |
status: | New → Triaged |
importance: | Undecided → High |
summary: |
- Ironic python agent cleaning fails with invalid GPT + Ironic python agent cleaning fails from CRC mismatch |
Changed in ironic-python-agent: | |
status: | Triaged → Fix Released |
I also ran into this. I have run the following 4 times but they still fail cleaning.
for node_ident in c05-h17-6048r c05-h21-6048r c05-h25-6048r ; do echo $node_ident ; openstack baremetal node manage $node_ident; openstack baremetal node maintenance unset $node_ident; openstack baremetal node provide $node_ident ; done