The agent does expect case when partx returns 0 even if it failed to read partition table

Bug #1736386 reported by Nikolay Fedotov on 2017-12-05
This bug affects 1 person
Affects Status Importance Assigned to Milestone
In Progress
Nikolay Fedotov

Bug Description

Here ^^^ partx returns 0 if it "failed to read partition table". The agent believe that everything is Ok and go further. A moment later the agent tries to find partition by UUID but it is not visible yet. As a result DeviceNotFound exception occurs then bare instance stucks in ERROR state.

The partition exists.

Retrying (on second+ attempt) "nova boot" helps.

See logs attached

Nikolay Fedotov (nfedotov) wrote :
Nikolay Fedotov (nfedotov) wrote :

Fix proposed to branch: master

Changed in ironic-python-agent:
assignee: nobody → Nikolay Fedotov (nfedotov)
status: New → In Progress
Nikolay Fedotov (nfedotov) wrote :

Proposed fix did not help. Now partx is called 3 times but it keeps returning "failed to read partition table" message and the partition is not found. But the conductor detected the partition and then passed it's UUID to ironic-agent a moment ago.

Dmitry Tantsur (divius) on 2017-12-11
Changed in ironic-python-agent:
importance: Undecided → Medium

I looked at the logs and I can't help but wonder if what is occuring is the bock device is still locked via the iscsi connection and it has not been fully disengaged and released, which would be required to read the new partition table.

Nikolay Fedotov (nfedotov) wrote :

Yes. It looks like there is a race condition somewhere between "destroying metadata"<->"exposing disk to conductor"->"do partitioning on conductor side". I added disk-wait after destroying metadata step and it works for me now. Created new issue for "No partition with UUID..." because this one is about partx. Thanks!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers