Activity log for bug #1737556

Date Who What changed Old value New value Message
2017-12-11 14:42:00 Doug Szumski bug added bug
2017-12-11 14:42:45 Doug Szumski description During node cleaning, the generic hardware manager can fail in the `erasing device metadata` step if the GPT is invalid. Specifically this can happen when the hardware manager calls ```sgdisk -Z /dev/somedrive``` to destroy the GPT and MBR data structures. It isn't clear why sgdisk is validating the GPT when the -Z flag (zap all) instructs sgdisk to destroy the GPT. However, upon retrying sgdisk -Z succeeds. Example failure: ``` maintenance_reason | Agent returned error for clean step {u'priority': 99, u'interface': | | | u'deploy', u'reboot_requested': False, u'abortable': True, u'step': | | | u'erase_devices_metadata'} on node 1b973868-9734-4ecf-9700-c0730e97e031 | | | : {u'message': u'Clean step failed: Error performing clean_step | | | erase_devices_metadata: Error erasing block device: Failed to erase the | | | metadata on the device(s): "/dev/nvme3n1": Unexpected error while | | | running command. | | | Command: sgdisk -Z /dev/nvme3n1 | | | Exit code: 2 | | | Stdout: | | | u"Caution! After loading partitions, the CRC doesn\'t check out!\ | | | GPT | | | data structures destroyed! You may now partition the disk using fdisk | | | or\ | | | other utilities.\ | | | " | | | Stderr: u"\\x07Caution: invalid main GPT | | | header, but valid backup; regenerating main header\ | | | from | | | backup!\ | | | \ | | | \\x07Warning! Main partition table CRC mismatch! Loaded | | | backup partition table\ | | | instead of main partition table!\ | | | \ | | | Warning! | | | One or more CRCs don\'t match. You should repair the disk!\ | | | \ | | | Invalid | | | partition data!\ | | | "', u'code': 500, u'type': u'CleaningError', | | | u'details': u'Error performing clean_step erase_devices_metadata: Error | | | erasing block device: Failed to erase the metadata on the device(s): | | | "/dev/nvme3n1": Unexpected error while running command. | | | Command: sgdisk | | | -Z /dev/nvme3n1 | | | Exit code: 2 | | | Stdout: u"Caution! After loading | | | partitions, the CRC doesn\'t check out!\ | | | GPT data structures destroyed! | | | You may now partition the disk using fdisk or\ | | | other | | | utilities.\ | | | " | | | Stderr: u"\\x07Caution: invalid main GPT header, but | | | valid backup; regenerating main header\ | | | from backup!\ | | | \ | | | \\x07Warning! | | | Main partition table CRC mismatch! Loaded backup partition | | | table\ | | | instead of main partition table!\ | | | \ | | | Warning! One or more CRCs | | | don\'t match. You should repair the disk!\ | | | \ | | | Invalid partition | | | data!\ | | | "'}. ```maintenance_reason | Agent returned error for clean step {u'priority': 99, u'interface': | | | u'deploy', u'reboot_requested': False, u'abortable': True, u'step': | | | u'erase_devices_metadata'} on node 1b973868-9734-4ecf-9700-c0730e97e031 | | | : {u'message': u'Clean step failed: Error performing clean_step | | | erase_devices_metadata: Error erasing block device: Failed to erase the | | | metadata on the device(s): "/dev/nvme3n1": Unexpected error while | | | running command. | | | Command: sgdisk -Z /dev/nvme3n1 | | | Exit code: 2 | | | Stdout: | | | u"Caution! After loading partitions, the CRC doesn\'t check out!\ | | | GPT | | | data structures destroyed! You may now partition the disk using fdisk | | | or\ | | | other utilities.\ | | | " | | | Stderr: u"\\x07Caution: invalid main GPT | | | header, but valid backup; regenerating main header\ | | | from | | | backup!\ | | | \ | | | \\x07Warning! Main partition table CRC mismatch! Loaded | | | backup partition table\ | | | instead of main partition table!\ | | | \ | | | Warning! | | | One or more CRCs don\'t match. You should repair the disk!\ | | | \ | | | Invalid | | | partition data!\ | | | "', u'code': 500, u'type': u'CleaningError', | | | u'details': u'Error performing clean_step erase_devices_metadata: Error | | | erasing block device: Failed to erase the metadata on the device(s): | | | "/dev/nvme3n1": Unexpected error while running command. | | | Command: sgdisk | | | -Z /dev/nvme3n1 | | | Exit code: 2 | | | Stdout: u"Caution! After loading | | | partitions, the CRC doesn\'t check out!\ | | | GPT data structures destroyed! | | | You may now partition the disk using fdisk or\ | | | other | | | utilities.\ | | | " | | | Stderr: u"\\x07Caution: invalid main GPT header, but | | | valid backup; regenerating main header\ | | | from backup!\ | | | \ | | | \\x07Warning! | | | Main partition table CRC mismatch! Loaded backup partition | | | table\ | | | instead of main partition table!\ | | | \ | | | Warning! One or more CRCs | | | don\'t match. You should repair the disk!\ | | | \ | | | Invalid partition | | | data!\ | | | "'}. During node cleaning, the generic hardware manager can fail in the `erasing device metadata` step if the GPT is invalid. Specifically this can happen when the hardware manager calls ```sgdisk -Z /dev/somedrive``` to destroy the GPT and MBR data structures. It isn't clear why sgdisk is validating the GPT when the -Z flag (zap all) instructs sgdisk to destroy the GPT. However, upon retrying sgdisk -Z succeeds. Example failure: |maintenance_reason | Agent returned error for clean step {u'priority': 99, u'interface': | | | u'deploy', u'reboot_requested': False, u'abortable': True, u'step': | | | u'erase_devices_metadata'} on node 1b973868-9734-4ecf-9700-c0730e97e031 | | | : {u'message': u'Clean step failed: Error performing clean_step | | | erase_devices_metadata: Error erasing block device: Failed to erase the | | | metadata on the device(s): "/dev/nvme3n1": Unexpected error while | | | running command. | | | Command: sgdisk -Z /dev/nvme3n1 | | | Exit code: 2 | | | Stdout: | | | u"Caution! After loading partitions, the CRC doesn\'t check out!\ | | | GPT | | | data structures destroyed! You may now partition the disk using fdisk | | | or\ | | | other utilities.\ | | | " | | | Stderr: u"\\x07Caution: invalid main GPT | | | header, but valid backup; regenerating main header\ | | | from | | | backup!\ | | | \ | | | \\x07Warning! Main partition table CRC mismatch! Loaded | | | backup partition table\ | | | instead of main partition table!\ | | | \ | | | Warning! | | | One or more CRCs don\'t match. You should repair the disk!\ | | | \ | | | Invalid | | | partition data!\ | | | "', u'code': 500, u'type': u'CleaningError', | | | u'details': u'Error performing clean_step erase_devices_metadata: Error | | | erasing block device: Failed to erase the metadata on the device(s): | | | "/dev/nvme3n1": Unexpected error while running command. | | | Command: sgdisk | | | -Z /dev/nvme3n1 | | | Exit code: 2 | | | Stdout: u"Caution! After loading | | | partitions, the CRC doesn\'t check out!\ | | | GPT data structures destroyed! | | | You may now partition the disk using fdisk or\ | | | other | | | utilities.\ | | | " | | | Stderr: u"\\x07Caution: invalid main GPT header, but | | | valid backup; regenerating main header\ | | | from backup!\ | | | \ | | | \\x07Warning! | | | Main partition table CRC mismatch! Loaded backup partition | | | table\ | | | instead of main partition table!\ | | | \ | | | Warning! One or more CRCs | | | don\'t match. You should repair the disk!\ | | | \ | | | Invalid partition | | | data!\ | | | "'}. ```maintenance_reason | Agent returned error for clean step {u'priority': 99, u'interface': | | | u'deploy', u'reboot_requested': False, u'abortable': True, u'step': | | | u'erase_devices_metadata'} on node 1b973868-9734-4ecf-9700-c0730e97e031 | | | : {u'message': u'Clean step failed: Error performing clean_step | | | erase_devices_metadata: Error erasing block device: Failed to erase the | | | metadata on the device(s): "/dev/nvme3n1": Unexpected error while | | | running command. | | | Command: sgdisk -Z /dev/nvme3n1 | | | Exit code: 2 | | | Stdout: | | | u"Caution! After loading partitions, the CRC doesn\'t check out!\ | | | GPT | | | data structures destroyed! You may now partition the disk using fdisk | | | or\ | | | other utilities.\ | | | " | | | Stderr: u"\\x07Caution: invalid main GPT | | | header, but valid backup; regenerating main header\ | | | from | | | backup!\ | | | \ | | | \\x07Warning! Main partition table CRC mismatch! Loaded | | | backup partition table\ | | | instead of main partition table!\ | | | \ | | | Warning! | | | One or more CRCs don\'t match. You should repair the disk!\ | | | \ | | | Invalid | | | partition data!\ | | | "', u'code': 500, u'type': u'CleaningError', | | | u'details': u'Error performing clean_step erase_devices_metadata: Error | | | erasing block device: Failed to erase the metadata on the device(s): | | | "/dev/nvme3n1": Unexpected error while running command. | | | Command: sgdisk | | | -Z /dev/nvme3n1 | | | Exit code: 2 | | | Stdout: u"Caution! After loading | | | partitions, the CRC doesn\'t check out!\ | | | GPT data structures destroyed! | | | You may now partition the disk using fdisk or\ | | | other | | | utilities.\ | | | " | | | Stderr: u"\\x07Caution: invalid main GPT header, but | | | valid backup; regenerating main header\ | | | from backup!\ | | | \ | | | \\x07Warning! | | | Main partition table CRC mismatch! Loaded backup partition | | | table\ | | | instead of main partition table!\ | | | \ | | | Warning! One or more CRCs | | | don\'t match. You should repair the disk!\ | | | \ | | | Invalid partition | | | data!\ | | | "'}.
2017-12-11 14:44:26 Doug Szumski description During node cleaning, the generic hardware manager can fail in the `erasing device metadata` step if the GPT is invalid. Specifically this can happen when the hardware manager calls ```sgdisk -Z /dev/somedrive``` to destroy the GPT and MBR data structures. It isn't clear why sgdisk is validating the GPT when the -Z flag (zap all) instructs sgdisk to destroy the GPT. However, upon retrying sgdisk -Z succeeds. Example failure: |maintenance_reason | Agent returned error for clean step {u'priority': 99, u'interface': | | | u'deploy', u'reboot_requested': False, u'abortable': True, u'step': | | | u'erase_devices_metadata'} on node 1b973868-9734-4ecf-9700-c0730e97e031 | | | : {u'message': u'Clean step failed: Error performing clean_step | | | erase_devices_metadata: Error erasing block device: Failed to erase the | | | metadata on the device(s): "/dev/nvme3n1": Unexpected error while | | | running command. | | | Command: sgdisk -Z /dev/nvme3n1 | | | Exit code: 2 | | | Stdout: | | | u"Caution! After loading partitions, the CRC doesn\'t check out!\ | | | GPT | | | data structures destroyed! You may now partition the disk using fdisk | | | or\ | | | other utilities.\ | | | " | | | Stderr: u"\\x07Caution: invalid main GPT | | | header, but valid backup; regenerating main header\ | | | from | | | backup!\ | | | \ | | | \\x07Warning! Main partition table CRC mismatch! Loaded | | | backup partition table\ | | | instead of main partition table!\ | | | \ | | | Warning! | | | One or more CRCs don\'t match. You should repair the disk!\ | | | \ | | | Invalid | | | partition data!\ | | | "', u'code': 500, u'type': u'CleaningError', | | | u'details': u'Error performing clean_step erase_devices_metadata: Error | | | erasing block device: Failed to erase the metadata on the device(s): | | | "/dev/nvme3n1": Unexpected error while running command. | | | Command: sgdisk | | | -Z /dev/nvme3n1 | | | Exit code: 2 | | | Stdout: u"Caution! After loading | | | partitions, the CRC doesn\'t check out!\ | | | GPT data structures destroyed! | | | You may now partition the disk using fdisk or\ | | | other | | | utilities.\ | | | " | | | Stderr: u"\\x07Caution: invalid main GPT header, but | | | valid backup; regenerating main header\ | | | from backup!\ | | | \ | | | \\x07Warning! | | | Main partition table CRC mismatch! Loaded backup partition | | | table\ | | | instead of main partition table!\ | | | \ | | | Warning! One or more CRCs | | | don\'t match. You should repair the disk!\ | | | \ | | | Invalid partition | | | data!\ | | | "'}. ```maintenance_reason | Agent returned error for clean step {u'priority': 99, u'interface': | | | u'deploy', u'reboot_requested': False, u'abortable': True, u'step': | | | u'erase_devices_metadata'} on node 1b973868-9734-4ecf-9700-c0730e97e031 | | | : {u'message': u'Clean step failed: Error performing clean_step | | | erase_devices_metadata: Error erasing block device: Failed to erase the | | | metadata on the device(s): "/dev/nvme3n1": Unexpected error while | | | running command. | | | Command: sgdisk -Z /dev/nvme3n1 | | | Exit code: 2 | | | Stdout: | | | u"Caution! After loading partitions, the CRC doesn\'t check out!\ | | | GPT | | | data structures destroyed! You may now partition the disk using fdisk | | | or\ | | | other utilities.\ | | | " | | | Stderr: u"\\x07Caution: invalid main GPT | | | header, but valid backup; regenerating main header\ | | | from | | | backup!\ | | | \ | | | \\x07Warning! Main partition table CRC mismatch! Loaded | | | backup partition table\ | | | instead of main partition table!\ | | | \ | | | Warning! | | | One or more CRCs don\'t match. You should repair the disk!\ | | | \ | | | Invalid | | | partition data!\ | | | "', u'code': 500, u'type': u'CleaningError', | | | u'details': u'Error performing clean_step erase_devices_metadata: Error | | | erasing block device: Failed to erase the metadata on the device(s): | | | "/dev/nvme3n1": Unexpected error while running command. | | | Command: sgdisk | | | -Z /dev/nvme3n1 | | | Exit code: 2 | | | Stdout: u"Caution! After loading | | | partitions, the CRC doesn\'t check out!\ | | | GPT data structures destroyed! | | | You may now partition the disk using fdisk or\ | | | other | | | utilities.\ | | | " | | | Stderr: u"\\x07Caution: invalid main GPT header, but | | | valid backup; regenerating main header\ | | | from backup!\ | | | \ | | | \\x07Warning! | | | Main partition table CRC mismatch! Loaded backup partition | | | table\ | | | instead of main partition table!\ | | | \ | | | Warning! One or more CRCs | | | don\'t match. You should repair the disk!\ | | | \ | | | Invalid partition | | | data!\ | | | "'}. During node cleaning, the generic hardware manager can fail in the `erasing device metadata` step if the GPT is invalid. Specifically this can happen when the hardware manager calls ```sgdisk -Z /dev/somedrive``` to destroy the GPT and MBR data structures. It isn't clear why sgdisk is validating the GPT when the -Z flag (zap all) instructs sgdisk to destroy the GPT. However, upon retrying sgdisk -Z succeeds. Example failure: 2017-12-11 12:14:47.449 7 ERROR ironic.drivers.modules.agent_base_vendor [-] Agent returned error for clean step {u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'} on node 1b973868-9734-4ecf-9700-c0730e97e031 : {u'message': u'Clean step failed: Error performing clean_step erase_devices_metadata: Error erasing block device: Failed to erase the metadata on the device(s): "/dev/nvme3n1": Unexpected error while running command.\nCommand: sgdisk -Z /dev/nvme3n1\nExit code: 2\nStdout: u"Caution! After loading partitions, the CRC doesn\' t check out!\\nGPT data structures destroyed! You may now partition the disk using fdisk or\\nother utilities.\\n"\nStderr: u"\\x07Caution: invalid main GPT header, but valid backup; regenerating main hea der\\nfrom backup!\\n\\n\\x07Warning! Main partition table CRC mismatch! Loaded backup partition table\\ninstead of main partition table!\\n\\nWarning! One or more CRCs don\'t match. You should repair the disk!\\n\\nInvalid partition data!\\n"', u'code': 500, u'type': u'CleaningError', u'details': u'Error performing clean_step erase_devices_metadata: Error erasing block device: Failed to erase the metadat a on the device(s): "/dev/nvme3n1": Unexpected error while running command.\nCommand: sgdisk -Z /dev/nvme3n1\nExit code: 2\nStdout: u"Caution! After loading partitions, the CRC doesn\'t check out!\\nGPT d ata structures destroyed! You may now partition the disk using fdisk or\\nother utilities.\\n"\nStderr: u"\\x07Caution: invalid main GPT header, but valid backup; regenerating main header\\nfrom backup!\\ n\\n\\x07Warning! Main partition table CRC mismatch! Loaded backup partition table\\ninstead of main partition table!\\n\\nWarning! One or more CRCs don\'t match. You should repair the disk!\\n\\nInvalid partition data!\\n"'}.
2017-12-11 14:47:18 Doug Szumski description During node cleaning, the generic hardware manager can fail in the `erasing device metadata` step if the GPT is invalid. Specifically this can happen when the hardware manager calls ```sgdisk -Z /dev/somedrive``` to destroy the GPT and MBR data structures. It isn't clear why sgdisk is validating the GPT when the -Z flag (zap all) instructs sgdisk to destroy the GPT. However, upon retrying sgdisk -Z succeeds. Example failure: 2017-12-11 12:14:47.449 7 ERROR ironic.drivers.modules.agent_base_vendor [-] Agent returned error for clean step {u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'} on node 1b973868-9734-4ecf-9700-c0730e97e031 : {u'message': u'Clean step failed: Error performing clean_step erase_devices_metadata: Error erasing block device: Failed to erase the metadata on the device(s): "/dev/nvme3n1": Unexpected error while running command.\nCommand: sgdisk -Z /dev/nvme3n1\nExit code: 2\nStdout: u"Caution! After loading partitions, the CRC doesn\' t check out!\\nGPT data structures destroyed! You may now partition the disk using fdisk or\\nother utilities.\\n"\nStderr: u"\\x07Caution: invalid main GPT header, but valid backup; regenerating main hea der\\nfrom backup!\\n\\n\\x07Warning! Main partition table CRC mismatch! Loaded backup partition table\\ninstead of main partition table!\\n\\nWarning! One or more CRCs don\'t match. You should repair the disk!\\n\\nInvalid partition data!\\n"', u'code': 500, u'type': u'CleaningError', u'details': u'Error performing clean_step erase_devices_metadata: Error erasing block device: Failed to erase the metadat a on the device(s): "/dev/nvme3n1": Unexpected error while running command.\nCommand: sgdisk -Z /dev/nvme3n1\nExit code: 2\nStdout: u"Caution! After loading partitions, the CRC doesn\'t check out!\\nGPT d ata structures destroyed! You may now partition the disk using fdisk or\\nother utilities.\\n"\nStderr: u"\\x07Caution: invalid main GPT header, but valid backup; regenerating main header\\nfrom backup!\\ n\\n\\x07Warning! Main partition table CRC mismatch! Loaded backup partition table\\ninstead of main partition table!\\n\\nWarning! One or more CRCs don\'t match. You should repair the disk!\\n\\nInvalid partition data!\\n"'}. During node cleaning, the generic hardware manager can fail in the `erasing device metadata` step if the GPT is invalid. Specifically this can happen when the hardware manager calls ```sgdisk -Z /dev/somedrive``` to destroy the GPT and MBR data structures. It isn't clear why sgdisk is validating the GPT when the -Z flag (zap all) instructs sgdisk to destroy the GPT. However, upon retrying sgdisk -Z succeeds. Example failure: 2017-12-11 12:14:47.449 7 ERROR ironic.drivers.modules.agent_base_vendor [-] Agent returned error for clean step {u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'} on node 1b973868-9734-4ecf-9700-c0730e97e031 : {u'message': u'Clean step failed: Error performing clean_step erase_devices_metadata: Error erasing block device: Failed to erase the metadata on the device(s): "/dev/nvme3n1": Unexpected error while running command.\nCommand: sgdisk -Z /dev/nvme3n1\nExit code: 2\nStdout: u"Caution! After loading partitions, the CRC doesn\' t check out!\\nGPT data structures destroyed! You may now partition the disk using fdisk or\\nother utilities.\\n"\nStderr: u"\\x07Caution: invalid main GPT header, but valid backup; regenerating main hea der\\nfrom backup!\\n\\n\\x07Warning! Main partition table CRC mismatch! Loaded backup partition table\\ninstead of main partition table!\\n\\nWarning! One or more CRCs don\'t match. You should repair the  disk!\\n\\nInvalid partition data!\\n"', u'code': 500, u'type': u'CleaningError', u'details': u'Error performing clean_step erase_devices_metadata: Error erasing block device: Failed to erase the metadat a on the device(s): "/dev/nvme3n1": Unexpected error while running command.\nCommand: sgdisk -Z /dev/nvme3n1\nExit code: 2\nStdout: u"Caution! After loading partitions, the CRC doesn\'t check out!\\nGPT d ata structures destroyed! You may now partition the disk using fdisk or\\nother utilities.\\n"\nStderr: u"\\x07Caution: invalid main GPT header, but valid backup; regenerating main header\\nfrom backup!\\ n\\n\\x07Warning! Main partition table CRC mismatch! Loaded backup partition table\\ninstead of main partition table!\\n\\nWarning! One or more CRCs don\'t match. You should repair the disk!\\n\\nInvalid partition data!\\n"'}. Workaroud: Retry the cleaning. For example, move the node to the `manage` state, and then to `provide`.
2017-12-11 15:12:14 Dmitry Tantsur ironic-python-agent: status New Triaged
2017-12-11 15:12:18 Dmitry Tantsur ironic-python-agent: importance Undecided High
2018-08-07 14:09:25 John Fulton summary Ironic python agent cleaning fails with invalid GPT Ironic python agent cleaning fails from CRC mismatch
2023-10-24 15:41:48 Jay Faulkner ironic-python-agent: status Triaged Fix Released