Comment 19 for bug 1880828

Revision history for this message
Taihsiang Ho (tai271828) wrote :

I nailed down this issue more precisely. By "pkill -f qemu-img" we could reproduce and apply the workaround quicker:

1. Deploy the openstack ( with the latest bionic train bundle and 4 arm servers)
2. Upload the ubuntu cloud image. "openstack image create --public --container-format=bare --disk-format=qcow2 --property hw_firmware_type=uefi --file ${IMAGE_TO_UPLOAD} ${IMAGE_UPLOADED_NAME}". Bionic and xenil are both fine. Both of them could reproduce this issue.
3. Reproduce this issue by "openstack server create --image ${IMAGE_UPLOADED_NAME} --flavor ${FLAVOR_NAME} --key-name ${KEY_NAME} --nic net-id=${net_id} ${INSTANCE_NAME}" (You should prepare the flavor, key and network first. Please refer to the README instruction of the corresponding bundle, or refer to the description of this bug.). Please issue several times of this command with different INSTANCE_NAME. (more than 5 is suggested because the reproducing rate is ~80%)
4. "nova list" to check the instance creation status.
5. If the endless spawaning status shows up (e.g. wait for more than 1 minute), issue "pkill -f qemu-img" on the target nova-compute node to terminate the qemu-img converting process. Repeat this step until you get the first successful created instance.

Several highlights:
1. qemu image converting process used by the openstack is "sequential". If one qemu image converting process does not terminate, the following re-creating proceseses won't be started.
2. nova does not timeout such endless qemu image converting process. (see comment#18)
3. By mixing item#1 and item#2, it results in all instance creation processes are always in "spawning" status.

Next:
1. We should go back to have a look of the issue mentioned in comment#4. The root cause looks like "qemu on arm machine". comment#9 shows a quick trial and shows no help. However, I could reproduce similar issue on an arm machine with the qemu package version used by the openstack. It is worthy to have a look why the SRUed deb does not work. (note: bug 1805256 is re-opened recently for regression.)
2. Trial this bug. For example, once we identifed the issue is exactly caused by bug 1805256. We should trace the bug instead. Besides, I would suggest to propose a feature enhancement like comment#18 (nova should timeout endless qemu image converting process)