After I deploy baremetal using windows 2008 R2 x64 image successfully, I rebuild the baremetal with ubuntu 14.04. There will occurs about 10% failures. I use ironic(ocata), IPA(ocata), ironic condutor error log is:
2017-03-05 11:30:00.841 11081 DEBUG ironic.drivers.modules.agent_client [-] Status of agent commands for node 74f7fd91-83ea-49a6-8e49-8385a02c47df: prepare_image: result "None", error "{u'message': u"Command execution failed: Failed to retrieve partition labels on disk /dev/sda for node 74f7fd91-83ea-49a6-8e49-8385a02c47df. Error: Unexpected error while running command.\nCommand: partprobe /dev/sda\nExit code: 1\nStdout: u''\nStderr: u'Error: Partition(s) 1 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.\\n'", u'code': 500, u'type': u'CommandExecutionError', u'details': u"Failed to retrieve partition labels on disk /dev/sda for node 74f7fd91-83ea-49a6-8e49-8385a02c47df. Error: Unexpected error while running command.\nCommand: partprobe /dev/sda\nExit code: 1\nStdout: u''\nStderr: u'Error: Partition(s) 1 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.\\n'"}" get_commands_status /usr/lib/python2.7/site-packages/ironic/drivers/modules/agent_client.py:107
2017-03-05 11:30:00.841 11081 ERROR ironic.drivers.modules.agent [-] node 74f7fd91-83ea-49a6-8e49-8385a02c47df command status errored: {u'message': u"Command execution failed: Failed to retrieve partition labels on disk /dev/sda for node 74f7fd91-83ea-49a6-8e49-8385a02c47df. Error: Unexpected error while running command.\nCommand: partprobe /dev/sda\nExit code: 1\nStdout: u''\nStderr: u'Error: Partition(s) 1 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.\\n'", u'code': 500, u'type': u'CommandExecutionError', u'details': u"Failed to retrieve partition labels on disk /dev/sda for node 74f7fd91-83ea-49a6-8e49-8385a02c47df. Error: Unexpected error while running command.\nCommand: partprobe /dev/sda\nExit code: 1\nStdout: u''\nStderr: u'Error: Partition(s) 1 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.\\n'"}
So I analysed the IPA code about writing image to disk and creating configdriver partition. I fount this error is relevant to the following oprations:
1. call sgdisk -Z /dev/sda
2. call qemu-img convert ...
3. partprobe /dev/sda
I think:
1. Before exec "partprobe", we should call "udevadm settle --timeout 600" to wait the event queue is finished
2. If partprobe still exec failed, we should retry 3x, because when I meet error, I login the baremetal node, and type the command "partprobe /dev/sda" manually and it is successful.
Fix proposed to branch: master /review. openstack. org/443604
Review: https:/