resize2fs fail with very large disks from small source image

Bug #955272 reported by Neil Wilson
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
cloud-initramfs-tools (Ubuntu)
Fix Released
Low
Unassigned
cloud-utils (Ubuntu)
Fix Released
Low
Unassigned

Bug Description

I'm getting failures with growpart and resize2fs with very large disks.

Above about 768Gb the online resizing in resize2fs fails - due to the lack of allocated blocks on the original filesystem.

The maximum appears to be

201326592 blocks

And with terabyte disks, the growpart command fails on boot - so the disk partition doesn't grow either.

sfdisk seems to fail in someway.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: cloud-utils 0.25-0ubuntu5
ProcVersionSignature: Ubuntu 3.2.0-18.29-virtual 3.2.9
Uname: Linux 3.2.0-18-virtual x86_64
ApportVersion: 1.94.1-0ubuntu2
Architecture: amd64
Date: Wed Mar 14 17:16:32 2012
PackageArchitecture: all
SourcePackage: cloud-utils
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Neil Wilson (neil-aldur) wrote :
Revision history for this message
Neil Wilson (neil-aldur) wrote :

Package: cloud-initramfs-growroot 0.4ubuntu1

Revision history for this message
Neil Wilson (neil-aldur) wrote :

2012-03-14 17:43:34,263 - cc_resizefs.py[DEBUG]: resizing root filesystem (type=ext4, maj=253, min=1)
2012-03-14 17:46:35,813 - cc_resizefs.py[WARNING]: Failed to resize filesystem (['resize2fs', '/tmp/tmpjdcY6H'])
2012-03-14 17:46:35,813 - cc_resizefs.py[WARNING]: output=Filesystem at /tmp/tmpjdcY6H is mounted on /; on-line resizing required
old desc_blocks = 1, new_desc_blocks = 80
Performing an on-line resize of /tmp/tmpjdcY6H to 335544312 (4k) blocks.

error=resize2fs 1.41.14 (22-Dec-2010)
resize2fs: Operation not permitted While trying to add group #8192

2012-03-14 17:46:35,874 - __init__.py[WARNING]: Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/cloudinit/CloudConfig/__init__.py", line 108, in run_cc_modules
    cc.handle(name, run_args, freq=freq)
  File "/usr/lib/python2.7/dist-packages/cloudinit/CloudConfig/__init__.py", line 72, in handle
    [ name, self.cfg, self.cloud, cloudinit.log, args ])
  File "/usr/lib/python2.7/dist-packages/cloudinit/__init__.py", line 309, in sem_and_run
    func(*args)
  File "/usr/lib/python2.7/dist-packages/cloudinit/CloudConfig/cc_resizefs.py", line 74, in handle
    (out,err) = util.subp(resize_cmd)
  File "/usr/lib/python2.7/dist-packages/cloudinit/util.py", line 148, in subp
    raise subprocess.CalledProcessError(sp.returncode,args, (out,err))
CalledProcessError: Command '['resize2fs', '/tmp/tmpjdcY6H']' returned non-zero exit status 1

2012-03-14 17:46:35,875 - __init__.py[ERROR]: config handling of resizefs, None, [] failed

Revision history for this message
Neil Wilson (neil-aldur) wrote :

Similarly on precise:

2012-03-14 18:11:00,708 - __init__.py[DEBUG]: handling resizefs with freq=None and args=[]
2012-03-14 18:11:00,721 - cc_resizefs.py[DEBUG]: resizing root filesystem (type=ext4, maj=253, min=1)
2012-03-14 18:14:04,588 - cc_resizefs.py[WARNING]: Failed to resize filesystem (['resize2fs', '/tmp/tmpN4o5Tg'])
2012-03-14 18:14:04,588 - cc_resizefs.py[WARNING]: output=Filesystem at /tmp/tmpN4o5Tg is mounted on /; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 80
Performing an on-line resize of /tmp/tmpN4o5Tg to 335544292 (4k) blocks.

error=resize2fs 1.42 (29-Nov-2011)
resize2fs: Operation not permitted While trying to add group #8192

Revision history for this message
Neil Wilson (neil-aldur) wrote :

From https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/656115

"Note that we reserve enough GDT blocks so you can grow the filesystem by a factor of 1024 of the initial size."

The source image is a 1G partition.

Revision history for this message
Scott Moser (smoser) wrote :

Neil,
  Do you have a suggestion on how best to handle this?
  bug 926160 has some data on what we're wasting in overhead on smaller (2G size), and that is non-trivial.

  The source here is probably 1408M, rather than 1G. Ben and I had estimated that for the images on cloud-images.ubuntu.com that FS could grow to ~ 1T, but your data shos that is different. How reasonable do you tihnk it is that people are going to have/*need* 768G root filesystem ? Offset that by the cost of wasted metadata space on smaller roots.

  Thoughts?

tags: added: cloud-images ec2-images
Revision history for this message
Scott Moser (smoser) wrote :

I guess that the sfdisk issue is separate from the filesystem overhead.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cloud-initramfs-tools (Ubuntu):
status: New → Confirmed
Changed in cloud-utils (Ubuntu):
status: New → Confirmed
Revision history for this message
Neil Wilson (neil-aldur) wrote :

Yep,

There's two bugs here. I'll make this one the 'resize' bug and raise another for the growpart failure once I've worked out what is going on there.

My initial thoughts was to have cc_resize_fs.py check the existing filesystem to see what its limits were and then resize to the smaller of those limits or the partition size - rather than failing horribly.

But it doesn't appear to be entirely trivial to work out what the maximum filesystem size is - even though mkfs.ext4 tells you when the filesystem is created.

In my use case we've just launched terabyte disks on the Brightbox Cloud. Launching those with an ubuntu image created using default mkfs.ext4 parameters is what is causing these faults. (Note that they are not the standard images, but are built with live-build in a similar manner).

summary: - growpart and resize2fs fail with very large disks
+ resize2fs fail with very large disks
summary: - resize2fs fail with very large disks
+ resize2fs fail with very large disks from small source image
Revision history for this message
Neil Wilson (neil-aldur) wrote :

The workaround is to create a 128M journal via

mkfs.ext4 -J size=128

which is the size of journal generally created on a 20G disk anyway.

This costs 24k blocks (96M) on a 1G partition - which is about 10% of the free blocks available.

When you do this Oneiric's resize2fs now appears to complete correctly (with a 1310720M partition).

ubuntu@srv-xqubo:~$ resize2fs xxl
resize2fs 1.41.14 (22-Dec-2010)
Resizing the filesystem on xxl to 335544320 (4k) blocks.
The filesystem on xxl is now 335544320 blocks long.

But if you run e2fsck across that filesystem it is heavily corrupted.

Revision history for this message
Neil Wilson (neil-aldur) wrote :

Same problem on precise.

Revision history for this message
Neil Wilson (neil-aldur) wrote :

Online resize still fails:

ubuntu@srv-eej0z:~$ sudo resize2fs /dev/loop0
resize2fs 1.42 (29-Nov-2011)
Filesystem at /dev/loop0 is mounted on /mnt; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 80
Performing an on-line resize of /dev/loop0 to 335544320 (4k) blocks.
resize2fs: Operation not permitted While trying to add group #8192

But the extended filesystem remains consistent when it does. The failure occurs at the maximum blocks that were shown when the original 1G filesystem was created.

So perhaps all we need to do is catch the error and report that the maximum filesystem size has been reached.

Revision history for this message
Neil Wilson (neil-aldur) wrote :

I've logged bug #956038 for the failure of resize2fs to check the maximum size of the filesystem.

The workaround is to allocate the maximum number of resize blocks with:

mkfs.ext4 -J size=128 -E resize=4294967295 -F tiny

The '2^32-1' number is to workaround faults in older versions of 'mke2fs'.

This is at a cost of about 29k blocks on the filesystem or about 11%.

Scott Moser (smoser)
Changed in cloud-initramfs-tools (Ubuntu):
importance: Undecided → Low
Changed in cloud-utils (Ubuntu):
importance: Undecided → Low
Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

In testing "mkfs.ext4 -E resize=536870912", it results in no additional space utilization, while allowing a full resize to 2TB. My test case was to create a two new 2G filesystems, copy over a cloud image root.tar.gz to both, then resize one to 2TB.

/dev/loop1 2.0G 803M 1.1G 42% /mnt/test/small_disk
/dev/loop0 2.0T 800M 1.9T 1% /mnt/test/big_disk

The resize times are on the order of 11-20minutes.

However, since using "-E resize=..." doesn't increase the size of the disk and works, I think that this is a sane approach. I want to do a test build through the build system and have Neil verify, but I see no reason why not to enable this. The danger I see to this apporach, however, is that the default journal size on these small cloud images is 64MB. I seriously doubt that if you have 2TB of disk that a 64MB journal is going to cut it.

Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

I might add that my test was for an offline resize, not an online resize. In testing, an 2GB ext4 filesystem can be resized to 2TB with out the "-E resize..." option.

Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

In attempting to repo this using real disks on EC2 (4x 450GB ephemeral storage devices in LVM configuration), the resize times are truly dreadful. I have to question the utility in resizing a small cloud image to a massive image as it took nearly 45 minutes to do an online resize.

But I have been unable to replicate this.

With that said, Neil, can you test a build with the resize=... applied:
http://people.canonical.com/~ben/drops/precise-server-cloudimg-amd64-disk1.img

Revision history for this message
Neil Wilson (neil-aldur) wrote : Re: [Bug 955272] Re: resize2fs fail with very large disks from small source image

Hi Ben,

Thanks very much for testing this out.

I've run that image on Brightbox Cloud and it takes 5 mins 25 seconds from
issuing the 'create' command to the ssh being available with a 2TB disk. So
that's including all the server provisioning, cloud-init key metadata stuff
and the initial login delay as well as the resize.

The output is correct:

ubuntu@srv-cdr4c:~$ df -h /dev/vda1
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 2.0T 771M 1.9T 1% /

If you do the same with the standard precise image from 'cloud-images' then
you get:

ubuntu@srv-pkjet:~$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 1.4T 772M 1.3T 1% /

The cloudinit logs have:

2013-02-15 14:26:47,275 - cc_resizefs.py[WARNING]: Failed to resize
filesystem (['resize2fs', '/run/cloudinit.resizefs.sziNUX'])
2013-02-15 14:26:47,280 - cc_resizefs.py[WARNING]: output=Filesystem at
/run/cloudinit.resizefs.sziNUX is mounted on /; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 128
Performing an on-line resize of /run/cloudinit.resizefs.sziNUX to 536868202
(4k) blocks.

error=resize2fs 1.42 (29-Nov-2011)
resize2fs: Operation not permitted While trying to add group #11264

And the file system is now corrupt as well.

So the corrected resize entry fixes the problem.

The build is somewhat slower than the usual 50 secs for a 20G server with
the standard image, but still with the bounds of usability I think.

A nice to have would be a 'resizing disk...' message on the console. The
console on our cloud is available as soon as the VM starts and you can
watch the boot process.

Tests linux mag did a few years ago suggest that the 64MB journal should be
fine unless you require lots of little files in lots of directories:
http://www.linux-mag.com/id/7666/ Admittedly with a smaller disk.

On 15 February 2013 00:15, Ben Howard <email address hidden> wrote:

> In attempting to repo this using real disks on EC2 (4x 450GB ephemeral
> storage devices in LVM configuration), the resize times are truly
> dreadful. I have to question the utility in resizing a small cloud image
> to a massive image as it took nearly 45 minutes to do an online resize.
>
> But I have been unable to replicate this.
>
> With that said, Neil, can you test a build with the resize=... applied:
>
> http://people.canonical.com/~ben/drops/precise-server-cloudimg-amd64-disk1.img
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/955272
>
> Title:
> resize2fs fail with very large disks from small source image
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/cloud-initramfs-tools/+bug/955272/+subscriptions
>

--
Neil Wilson

Changed in cloud-initramfs-tools (Ubuntu):
status: Confirmed → Fix Released
Changed in cloud-utils (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.