create_config_drive_partition failed because ironic-lib call cmd "partprobe /dev/sda" failed

Bug #1670239 reported by Hao Li
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ironic
In Progress
Medium
Hao Li
ironic-lib
In Progress
Medium
Hao Li

Bug Description

After I deploy baremetal using windows 2008 R2 x64 image successfully, I rebuild the baremetal with ubuntu 14.04. There will occurs about 10% failures. I use ironic(ocata), IPA(ocata), ironic condutor error log is:

2017-03-05 11:30:00.841 11081 DEBUG ironic.drivers.modules.agent_client [-] Status of agent commands for node 74f7fd91-83ea-49a6-8e49-8385a02c47df: prepare_image: result "None", error "{u'message': u"Command execution failed: Failed to retrieve partition labels on disk /dev/sda for node 74f7fd91-83ea-49a6-8e49-8385a02c47df. Error: Unexpected error while running command.\nCommand: partprobe /dev/sda\nExit code: 1\nStdout: u''\nStderr: u'Error: Partition(s) 1 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.\\n'", u'code': 500, u'type': u'CommandExecutionError', u'details': u"Failed to retrieve partition labels on disk /dev/sda for node 74f7fd91-83ea-49a6-8e49-8385a02c47df. Error: Unexpected error while running command.\nCommand: partprobe /dev/sda\nExit code: 1\nStdout: u''\nStderr: u'Error: Partition(s) 1 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.\\n'"}" get_commands_status /usr/lib/python2.7/site-packages/ironic/drivers/modules/agent_client.py:107
2017-03-05 11:30:00.841 11081 ERROR ironic.drivers.modules.agent [-] node 74f7fd91-83ea-49a6-8e49-8385a02c47df command status errored: {u'message': u"Command execution failed: Failed to retrieve partition labels on disk /dev/sda for node 74f7fd91-83ea-49a6-8e49-8385a02c47df. Error: Unexpected error while running command.\nCommand: partprobe /dev/sda\nExit code: 1\nStdout: u''\nStderr: u'Error: Partition(s) 1 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.\\n'", u'code': 500, u'type': u'CommandExecutionError', u'details': u"Failed to retrieve partition labels on disk /dev/sda for node 74f7fd91-83ea-49a6-8e49-8385a02c47df. Error: Unexpected error while running command.\nCommand: partprobe /dev/sda\nExit code: 1\nStdout: u''\nStderr: u'Error: Partition(s) 1 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.\\n'"}

So I analysed the IPA code about writing image to disk and creating configdriver partition. I fount this error is relevant to the following oprations:
1. call sgdisk -Z /dev/sda
2. call qemu-img convert ...
3. partprobe /dev/sda

I think:
1. Before exec "partprobe", we should call "udevadm settle --timeout 600" to wait the event queue is finished
2. If partprobe still exec failed, we should retry 3x, because when I meet error, I login the baremetal node, and type the command "partprobe /dev/sda" manually and it is successful.

Hao Li (lihaosz)
Changed in ironic-lib:
assignee: nobody → Hao Li (lihaosz)
Changed in ironic-python-agent:
assignee: nobody → Hao Li (lihaosz)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic-lib (master)

Fix proposed to branch: master
Review: https://review.openstack.org/443604

Changed in ironic-lib:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/444061

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ironic-lib (master)

Change abandoned by Hao Li (<email address hidden>) on branch: master
Review: https://review.openstack.org/444061
Reason: Repeat commit

Revision history for this message
Hao Li (lihaosz) wrote :

We should change ./etc/ironic/rootwrap.d/ironic-lib.filters in ironic project.
add
udevadm: CommandFilter, udevadm, root
Because we still need it to avoid gate breakage and preserve compatibily with existing installation.

Changed in ironic:
assignee: nobody → Hao Li (lihaosz)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to ironic (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/444109

Michael Turek (mjturek)
Changed in ironic:
status: New → In Progress
Revision history for this message
Vladyslav Drok (vdrok) wrote :

So it seems like this should be fixed in the ironic_lib. IPA should move to using ironic_lib scripts. So, on IPA side, I think the bug can be closed as duplicate of https://bugs.launchpad.net/ironic-python-agent/+bug/1557542. On ironic side there should be no changes, so I'm closing it on ironic project.

Revision history for this message
Vladyslav Drok (vdrok) wrote :

Hrm, I see you 're adding new rules in ironic, so leaving it open there for now.

no longer affects: ironic-python-agent
Changed in ironic:
importance: Undecided → Medium
Changed in ironic-lib:
importance: Undecided → Medium
Revision history for this message
fxpester (a-yurtaykin) wrote :

hit this bug too, fixed by using master CoreOS images

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ironic (master)

Change abandoned by Riccardo Pittau (<email address hidden>) on branch: master
Review: https://review.opendev.org/444109
Reason: inactivity

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ironic-lib (master)

Change abandoned by Riccardo Pittau (<email address hidden>) on branch: master
Review: https://review.opendev.org/443604
Reason: obsolete

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.