No support for sparse images

Bug #810493 reported by Soren Hansen
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
glance_store
Won't Fix
Wishlist
Unassigned

Bug Description

I could have sworn I filed this bug already, but I don't see it now. Oh, well.

Glance does not seem to support any sort of sparse images. For example, Ubuntu's cloud images are a 1½ GB filesystem, but if it were sparsely allocated it would only take up a couple of hundred MB.

Amazon handles this by using tarballs as their image transport format.

Revision history for this message
Vish Ishaya (vishvananda) wrote : Re: [Bug 810493] [NEW] No support for sparse images

It is possible to put a qcow2 image into glance if you are using kvm.

Vish

On Jul 14, 2011, at 6:58 AM, Soren Hansen wrote:

> Public bug reported:
>
> I could have sworn I filed this bug already, but I don't see it now. Oh,
> well.
>
> Glance does not seem to support any sort of sparse images. For example,
> Ubuntu's cloud images are a 1½ GB filesystem, but if it were sparsely
> allocated it would only take up a couple of hundred MB.
>
> Amazon handles this by using tarballs as their image transport format.
>
> ** Affects: glance
> Importance: Undecided
> Status: New
>
> --
> You received this bug notification because you are a member of Glance
> Bug Team, which is subscribed to Glance.
> https://bugs.launchpad.net/bugs/810493
>
> Title:
> No support for sparse images
>
> Status in OpenStack Image Registry and Delivery Service (Glance):
> New
>
> Bug description:
> I could have sworn I filed this bug already, but I don't see it now.
> Oh, well.
>
> Glance does not seem to support any sort of sparse images. For
> example, Ubuntu's cloud images are a 1½ GB filesystem, but if it were
> sparsely allocated it would only take up a couple of hundred MB.
>
> Amazon handles this by using tarballs as their image transport format.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/glance/+bug/810493/+subscriptions

Revision history for this message
Soren Hansen (soren) wrote :

2011/7/14 Vish Ishaya <email address hidden>:
> It is possible to put a qcow2 image into glance if you are using kvm.

It's debatable whether qcow2 images really are sparse :) but
nevertheless, I think Glance should support this more natively. There
is a *lot* of disk space and network traffic to be saved here.

If we take the Ubuntu 11.04 cloud image as an example:
$ ls -l ubuntu-11.04
-rw-r--r-- 2 glance nogroup 1476395008 2011-07-07 14:43 ubuntu-11.04

# Create a sparse version:
$ cp --sparse=always ubuntu-11.04 ubuntu-11.04.sparse

# a tar'ed version:
$ tar cvSf ubuntu-11.04.sparse.tar ubuntu-11.04.sparse

# Both gzip+tar:
$ tar cvzSf ubuntu-11.04.sparse.tar.gz ubuntu-11.04.sparse

$ ls -ls ubuntu-11.04*
1441824 -rw-r--r-- 2 soren soren 1476395008 2011-07-07 14:43 ubuntu-11.04
 567012 -rw-r--r-- 1 soren soren 1476395008 2011-07-15 15:14
ubuntu-11.04.sparse
 478856 -rw-r--r-- 1 soren soren 490342400 2011-07-15 15:29
ubuntu-11.04.sparse.tar
 172208 -rw-r--r-- 1 soren soren 176339294 2011-07-15 15:24
ubuntu-11.04.sparse.tar.gz

Unsurprisingly, the worst sinner is the fully allocated disk image.
It's costly to store and costly to transfer over the network.

The sparsely allocated image is quite a bit better in terms of storage
space on the Glance host, but doesn't save any network bandwidth.

The tar'ed version further reduces the allocated size by quite a bit
and also reduces the amount of data that needs to be transferred over
the network.

The tar+gzipped version, though, is the clear winner. It's only a
fraction of the size of the other images and also saves a *lot* of
bandwidth.

Just as an extra data point, I also added a gzip'ed version of the
original image:
 174448 -rw-r--r-- 1 soren soren 178628893 2011-07-15 15:34 ubuntu-11.04.gz

..but before anyone suggests just adding support for Content-Encoding:
gzip, remember that this means that Glance would have to process the
full image (in this case, a reasonably measly 1.4 GB image, but
potentially much larger images) and compress it in-flight. The
receiving end would also need to do special tricks to maintain the
image's sparseness, as gunzip doesn't care to create sparse files on
unpack. Adding tar to the mix means that only 490342400 bytes need to
get gzip'ed, and tar natively handles packing/unpacking sparse images
effectively.

--
Soren Hansen        | http://linux2go.dk/
Ubuntu Developer    | http://www.ubuntu.com/
OpenStack Developer | http://www.openstack.org/

Revision history for this message
Ewan Mellor (ewanmellor) wrote :
Download full text (3.7 KiB)

There is lots of open source code for handling VHD files in userspace. It's come from the work that we did on Xen, but none of it is Xen-specific. Someone could repurpose that very usefully. VHD files are perfect for this use-case.

Ewan.

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of Soren Hansen
Sent: Friday, July 15, 2011 6:49 AM
To: Ewan Mellor
Subject: Re: [Bug 810493] [NEW] No support for sparse images

2011/7/14 Vish Ishaya <email address hidden>:
> It is possible to put a qcow2 image into glance if you are using kvm.

It's debatable whether qcow2 images really are sparse :) but
nevertheless, I think Glance should support this more natively. There
is a *lot* of disk space and network traffic to be saved here.

If we take the Ubuntu 11.04 cloud image as an example:
$ ls -l ubuntu-11.04
-rw-r--r-- 2 glance nogroup 1476395008 2011-07-07 14:43 ubuntu-11.04

# Create a sparse version:
$ cp --sparse=always ubuntu-11.04 ubuntu-11.04.sparse

# a tar'ed version:
$ tar cvSf ubuntu-11.04.sparse.tar ubuntu-11.04.sparse

# Both gzip+tar:
$ tar cvzSf ubuntu-11.04.sparse.tar.gz ubuntu-11.04.sparse

$ ls -ls ubuntu-11.04*
1441824 -rw-r--r-- 2 soren soren 1476395008 2011-07-07 14:43 ubuntu-11.04
 567012 -rw-r--r-- 1 soren soren 1476395008 2011-07-15 15:14
ubuntu-11.04.sparse
 478856 -rw-r--r-- 1 soren soren 490342400 2011-07-15 15:29
ubuntu-11.04.sparse.tar
 172208 -rw-r--r-- 1 soren soren 176339294 2011-07-15 15:24
ubuntu-11.04.sparse.tar.gz

Unsurprisingly, the worst sinner is the fully allocated disk image.
It's costly to store and costly to transfer over the network.

The sparsely allocated image is quite a bit better in terms of storage
space on the Glance host, but doesn't save any network bandwidth.

The tar'ed version further reduces the allocated size by quite a bit
and also reduces the amount of data that needs to be transferred over
the network.

The tar+gzipped version, though, is the clear winner. It's only a
fraction of the size of the other images and also saves a *lot* of
bandwidth.

Just as an extra data point, I also added a gzip'ed version of the
original image:
 174448 -rw-r--r-- 1 soren soren 178628893 2011-07-15 15:34 ubuntu-11.04.gz

..but before anyone suggests just adding support for Content-Encoding:
gzip, remember that this means that Glance would have to process the
full image (in this case, a reasonably measly 1.4 GB image, but
potentially much larger images) and compress it in-flight. The
receiving end would also need to do special tricks to maintain the
image's sparseness, as gunzip doesn't care to create sparse files on
unpack. Adding tar to the mix means that only 490342400 bytes need to
get gzip'ed, and tar natively handles packing/unpacking sparse images
effectively.

--
Soren Hansen        | http://linux2go.dk/
Ubuntu Developer    | http://www.ubuntu.com/
OpenStack Developer | http://www.openstack.org/

--
You received this bug notification because you are subscribed to
OpenStack.
https://bugs.launchpad.net/bugs/810493

Title:
  No support for sparse images

St...

Read more...

Jay Pipes (jaypipes)
Changed in glance:
status: New → Confirmed
importance: Undecided → Wishlist
Revision history for this message
Lothar Gesslein (lgesslein) wrote :

I've written patches to scratch our own itch that make glance write raw sparse files to disk and that make nova-compute write raw sparse files to disk. They do not change the transport format, so all zeros still go uncompressed over the network. Obviously this needs some more work to make it an configuration option and maybe someone has a faster way to check for all-zeros blocks.

Glance:

diff --git a/glance/store/filesystem.py b/glance/store/filesystem.py
index 6f028b7..b5ba775 100644
--- a/glance/store/filesystem.py
+++ b/glance/store/filesystem.py
@@ -24,6 +24,7 @@ import hashlib
 import logging
 import os
 import urlparse
+import re

 from glance.common import cfg
 from glance.common import exception
@@ -197,13 +198,18 @@ class Store(glance.store.base.Store):

         checksum = hashlib.md5()
         bytes_written = 0
+ CHECK_RE = re.compile('^[0]*$')
         try:
             with open(filepath, 'wb') as f:
                 for buf in utils.chunkreadable(image_file,
                                               ChunkedFile.CHUNKSIZE):
                     bytes_written += len(buf)
                     checksum.update(buf)
- f.write(buf)
+ if CHECK_RE.match(buf.encode("hex")):
+ f.seek(len(buf),1)
+ f.truncate()
+ else:
+ f.write(buf)
         except IOError as e:
             if e.errno in [errno.EFBIG, errno.ENOSPC]:
                 raise exception.StorageFull()

Nova:

diff --git a/nova/image/glance.py b/nova/image/glance.py
index 97a60cb..10b3a76 100644
--- a/nova/image/glance.py
+++ b/nova/image/glance.py
@@ -26,6 +26,7 @@ import random
 import sys
 import time
 import urlparse
+import re

 from glance.common import exception as glance_exception

@@ -262,8 +263,13 @@ class GlanceImageService(object):
         except Exception:
             _reraise_translated_image_exception(image_id)

+ CHECK_RE = re.compile('^[0]*$')
         for chunk in image_chunks:
- data.write(chunk)
+ if CHECK_RE.match(chunk.encode("hex")):
+ data.seek(len(chunk),1)
+ data.truncate()
+ else:
+ data.write(chunk)

         base_image_meta = self._translate_from_glance(image_meta)
         return base_image_meta

Revision history for this message
Christian Berendt (berendt) wrote :

I commited the patch provided by Lothar for Glance and Nova to place this feature in Grizzly. I added two configuration parameter to make the feature optional by default.

Nova: https://review.openstack.org/#/c/17544/
Glance: https://review.openstack.org/#/c/17542/

Changed in glance:
assignee: nobody → Christian Berendt (berendt)
status: Confirmed → In Progress
Changed in nova:
status: New → In Progress
assignee: nobody → Christian Berendt (berendt)
importance: Undecided → Wishlist
milestone: none → havana-3
Thierry Carrez (ttx)
Changed in nova:
milestone: havana-3 → havana-rc1
Changed in nova:
milestone: havana-rc1 → none
Revision history for this message
Tom Fifield (fifieldt) wrote :

Looks like this bug hasn't been touched in a while. I'm going to remove the assignee and set the status back to confirmed. If this was done in error, please change back :)

Changed in nova:
assignee: Christian Berendt (berendt) → nobody
Changed in glance:
assignee: Christian Berendt (berendt) → nobody
Changed in nova:
status: In Progress → Confirmed
Changed in glance:
status: In Progress → Confirmed
Revision history for this message
Sean Dague (sdague) wrote :

Until the glance issue is addressed, it's not possible to do anything in nova for this. Removing nova.

no longer affects: nova
Revision history for this message
Erno Kuvaja (jokke) wrote :

Moving to store as that's where this should be coming from.

affects: glance → glance-store
Revision history for this message
Ian Cordasco (icordasc) wrote :

This has sat idle for more than 2 years. I'm going to close this in a week if no one can confirm this is still a valid issue for glance_store.

tags: added: propose-close
Revision history for this message
Ian Cordasco (icordasc) wrote :

Closing as promised.

Changed in glance-store:
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.