growpart reverses changes because of partx failure

Bug #1244662 reported by Walter Heukels
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
util-linux (Ubuntu)
Medium
Unassigned

Bug Description

growpart is failing to resize my virtual servers because partx fails to reload the partition table:

root@pierce:~# growpart /dev/vda 3
failed [pt_update:1] pt_update /dev/vda 3
partx: /dev/vda: error updating partition 3
FAILED: pt_resize failed
***** WARNING: Resize failed, attempting to revert ******
Re-reading the partition table ...
BLKRRPART: Device or resource busy
The command to re-read the partition table failed.
Run partprobe(8), kpartx(8) or reboot your system now,
before using mkfs
***** Appears to have gone OK ****

root@pierce:~# partx --update 3 /dev/vda
partx: /dev/vda: error updating partition 3

Apparently it tries to delete the partition, which fails because it's in use:

ioctl(3, BLKPG, {BLKPG_DEL_PARTITION, flags=0, datalen=148, {start=0, length=0, pno=3, devname="", volname=""}}) = -1 EBUSY (Device or resource busy)

partprobe doesn't do this, and it works fine, so when I edit pt_update() like this:

        #partx --update "$part" "$dev"
        partprobe "$dev"

... everything works as expected.

Version numbers:
cloud-guest-utils: 0.27-0ubuntu4
Description: Ubuntu 13.10
Release: 13.10

Revision history for this message
Scott Moser (smoser) wrote :

Could you please provide the layout of /dev/vda before running growpart ?

I can't reproduce this locally.

Changed in cloud-utils (Ubuntu):
status: New → Incomplete
importance: Undecided → Medium
Revision history for this message
Scott Moser (smoser) wrote :

@Phillip,
  I think the loop that is walking the partitions in ./disk-utils/partx.c for upd_parts isn't functioning correctly.
  I seem to only be able to use partx if the partition provided to partx (via '--nr' or by number) is 1.

Revision history for this message
Scott Moser (smoser) wrote :

marking this confirmed. I did recreate.

Changed in cloud-utils (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Scott Moser (smoser) wrote :

Attaching 'go-1244662.sh' which recreates this bug in partx.

Changed in util-linux (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Scott Moser (smoser) wrote :

Attaching a seemingly working patch.

@Psusi,
  Could you please look at this patch and see if you think it is valid? It seems to fix the issue for me.

  I'm not sure if the style is correct. I chose to put the helper function inside partx rather than in partitions.c where other helper functions live (just for reference, I'll attach a patch that does that too, but that requires exporting a new function in libblkid).

Revision history for this message
Scott Moser (smoser) wrote :
tags: added: patch
Revision history for this message
Phillip Susi (psusi) wrote :

Sigh.. looks like a silly confusion between base 0 and base 1 numbering. We really need to get util-linux cleaned up and back in sync with upstream. This branch should fix it.

Revision history for this message
Scott Moser (smoser) wrote :

Phillip,
  Just for the record, this same bug exists in upstream util-linux.
  The patch I applied was against upstream. Will you forward upstream?

  Also, the thing I was unsure of (that motiviated the helper function) was if the blkid_partlist was guaranteed to be ordered by partition number. Ie, what if there was no partition 1, or if the partition numbered 3 was located on the disk before partition 1.

Scott

Revision history for this message
Scott Moser (smoser) wrote :

I was right, testing with a funky layout still shows errors with your patch, but works with my suggested patch.
Heres a test case (modified version of the first one) that shows that failure.

Revision history for this message
Phillip Susi (psusi) wrote :

It is a little difficult to follow your patch since it looks like you just #if 0'd out the existing block and mostly duplicated it but changed to calling an entirely new function that I'm not sure is neccesary.

Revision history for this message
Phillip Susi (psusi) wrote :

Woops, I totally misunderstood what was going on. It looks like it was simply the skip/continue tests were using n when they meant an. Try my branch now.

Revision history for this message
Scott Moser (smoser) wrote :

shoot. I thought I had attached a cleaned up version of that.
I tested your newer diff, and i believe it still to be wrong.

My confusion stems from not knowing how partx_{add,del,resize}_partition functions understand their 'int partno' argument.

Given the partition table layout created by the following input to sfdisk -uS $DEV:
0,0,0
4096,2048,L
0,0,0
2048,2048,L

sfdisk --durmp output shows that as:
$ sudo sfdisk -uS --dump /dev/vdb
# partition table of /dev/vdb
unit: sectors
/dev/vdb1 : start= 0, size= 0, Id= 0
/dev/vdb2 : start= 4096, size= 2048, Id=83
/dev/vdb3 : start= 0, size= 0, Id= 0
/dev/vdb4 : start= 2048, size= 2048, Id=83

There are 2 different numbering schemes being mixed here.
a.) the order of blkid_struct_partlist.parts array as returned by get_partlist
  To my understanding, this is the order as seen in the partition table.
ie:
  vdb2: 1
  vdb4: 0

b.) the "human order" that is input to partx.
  vdb2: 2
  vdb4: 4

The code as it is right now (and with your suggested fix) I believe makes an invalid assumption about the relationship between A and B.

I'm fairly sure that partx_{add,del,resize}_partition assume 'b' for the partno parameter.

My fix was to only deal with 'b' in the loop by adding the helper function that got the partition 'b' with input 'a'. See my cleaned up patch for example.

Revision history for this message
Phillip Susi (psusi) wrote :

Looks good to me, except that an is now an unused local so it should be removed. I wonder if it might not be better to put the helper function in libblkid so others can use it? I suppose that's a question for upstream.

Revision history for this message
Scott Moser (smoser) wrote :

Forwarded upstream to util-linux mailing list: http://article.gmane.org/gmane.linux.utilities.util-linux-ng/8385

Scott Moser (smoser)
no longer affects: cloud-utils (Ubuntu)
Changed in util-linux (Ubuntu):
status: Confirmed → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package util-linux - 2.20.1-5.1ubuntu12

---------------
util-linux (2.20.1-5.1ubuntu12) trusty; urgency=medium

  * partx/partx.c: apply upstream commit 84ac311212c0696 for fixing bug
   with 'partx --update' when used with partitions other than 1 or out of
   order partition table. (LP: #1244662)
 -- Scott Moser <email address hidden> Tue, 14 Jan 2014 10:06:30 -0500

Changed in util-linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers