fsck takes ages with one large partition

Bug #397745 reported by rew
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
partman-auto (Ubuntu)
New
Undecided
Unassigned

Bug Description

Binary package hint: debian-installer

In the old days you'd make partitions for /, /tmp, /usr, /var, /home, and several other locations of the filesystem. Nowadays I personally prefer to have just two, and I was moving towards favoring the modern Ubuntu approach: Everything on one big partition.

However. My "big" partition can no longer be fsck-ed: Too many inodes/files/directories. Thus I need a modern fsck, and a directory where the fsck-temp-files can be stored.

fsck-author Tytso, says: this is usually not a problem because root filesystems are usually not that big.

This will bite us at some point in the future if we continue to make the root encompass the whole disk.

(After 26 hours of fsck, the progress bar is around 9%, 1.6G of memory is allocated and 250Mb of disk space...)

Revision history for this message
Colin Watson (cjwatson) wrote : Re: [Bug 397745] [NEW] One-large-partition is not advisable.

Surely fsck is much faster on ext4? We definitely don't want to move
towards something other than one-large-partition by default - it
complicates all kinds of things, including partition management and
efficient booting.

Revision history for this message
rew (r-e-wolff) wrote : Re: One-large-partition is not advisable.

The "fsck that is fast" that you're thinking of is the one with the reboot-after-power-failure.

the fsck I'm talking about is the one that happens once-every-half-a-year (provided you reboot that often), even if no unclean reboot ever happened.

The filesystem I'm talking about is a bit extreme. It has more than a billion filenames. (only about 67million files).

If systems continue to be installed like this natural growth will ensure this to become a problem at some point in the future.

Revision history for this message
Colin Watson (cjwatson) wrote : Re: [Bug 397745] Re: One-large-partition is not advisable.

I guess so, but I don't think changing to multiple partitions is the
answer; it just introduces too many problems to do that by default.

Colin Watson (cjwatson)
affects: debian-installer (Ubuntu) → partman-auto (Ubuntu)
summary: - One-large-partition is not advisable.
+ fsck takes ages with one large partition
Revision history for this message
Phillip Susi (psusi) wrote :

67 million files with over a billion names? That seems beyond extreme. I only have about 300k names and inodes on this system. My server I'm sure has quite a few more since it is holding a large collection of email in Maildir format, but even that I'm pretty sure is not much more than a million.

67 million inodes is about the limit for a 1 TB fs with the default mke2fs options.

I don't think this describes a valid bug...

Revision history for this message
rew (r-e-wolff) wrote :

Instead of just fixing bugs that pop up, we can also "think ahead". Even though nobody has a 4Tb drive yet, we might prepare for the fact that a few years from now, they will be common.

Similarly, my report warns about the consequences of certain design decisions.

Yes, I have an extreme number of files. I've had "many files" for a long time. I remember my PC-XT hitting the 2000 files limit of "norton commander".

There are always people with "extreme" configurations that run into problems that normal people won't hit soon. But "what is normal" will grow. Ten years ago PCs had 20-30Gb of disk space and 1T was outrageous. Nowadays many new PCs come with a 1T disk.

How do I end up with that many files? I make backups. I use rsync to create a copy of my data. Then I cp -lr the data to a directory tagged with the current date. I end up with a copy of the filesystem at every backup-date. And files that don't change don't take up much space.

Thus I can look back at what my filesystem looked like at every point in time.

I believe backuppc does something similar (but it didn't exist/do this back when I started backing up like this).

Revision history for this message
Phillip Susi (psusi) wrote :

I don't see 67 million files becoming the norm any time in the foreseeable future, even with multi TB drives. Over time the size of files tends to grow much more than the number of files people have. Your average person a few years from now with a 4 tb drive likely will be using much of that storage to hold large video files or something. People with unusual and/or extreme configurations that require that many files and don't mind splitting into multiple partitions can do so, but I don't think that is the norm and so the default configuration does not need changed.

I don't even see how splitting up the disk into multiple partitions would help matters any; the total time to fsck them all would be the same.

Revision history for this message
rew (r-e-wolff) wrote :

I'm pretty sure that 67000000 files is not a magic number. I'm pretty sure that 66999999 files will also be horribly slow.

Just the fact that /I/ happen to have 67 million files makes this bug invalid in your eyes?

My last "fight" with fsck resulted in an fsck time of something like three months. Provided your assumption of linear fsck time with file system parameters is valid, this would mean that 100x less files would result in an fsck time of 1 day. Unacceptable to many people. OR 1000 times less files may take 2.4 hours. How do you explain your "not much" answer to your boss after he askes: "what did you do this morning?". "I turned on my workstation this morning and it decided to fsck the root partition, and the fsck took 2.4 hours. So this morning I mostly waited for my machine to boot... "

Over the years people will have more and more files. This will continue to grow. I mentioned the fact that I was "beyond reasonable" back in 1987 with only 2200 files.

I adapt the boot stuff of my fileservers so that they will boot first, (i.e. start all services that depend on just the root) and then they start fscking the data partitions. When done they will mount them and restart services that depend on them (nfs).

This allows me to monitor their progress remotely. Stuff like that. But also I can already use the services that they provide that are not dependent on the large partition with most of the files.

If we have just one partition this is no longer possible: ALL services depend on "fsck of /" .

Another reason why splitting up the disk helps is that fsck is not linear with the disk space. Especially when fsck's working set is beyond the amount of RAM available.

When fsck's temporary storage requirements exceed RAM+swap, you better have a partition available that allows for the storage of the temporary files. Having a root partition that holds the system files is good: The amount of files there will grow naturally. Those that have "more than average" storage and files will have them in /opt or /home or something like that. Likely a separate partition. Thus the temporary files for the /home partition can be set up to live on / .

I work in data-recovery. Some of the recoveries are mostly images. As the number of pixels increases, the effects of jpeg compression increases as well. An image from a 12Mpixel camera is not twice as big as one from a 6 Mpixel camera. So, ten years from now when SLR cameras are 20-50 Mpixels, we'll have 4-8Mbytes of jpg image data. People will shoot and store many more images on their 20T drive. Now do the math of how many images will fit....

Revision history for this message
Phillip Susi (psusi) wrote : Re: [Bug 397745] Re: fsck takes ages with one large partition
Download full text (3.4 KiB)

On 5/2/2011 4:27 AM, rew wrote:
> I'm pretty sure that 67000000 files is not a magic number. I'm pretty
> sure that 66999999 files will also be horribly slow.

Of course.

> Just the fact that /I/ happen to have 67 million files makes this bug
> invalid in your eyes?

No, it is the fact that this is a highly unusual configuration that Ted
Tso seems to think is an abusive and unsupported use of the fs, and
therefore, Ubuntu should not be trying to optimize its defaults for.

> My last "fight" with fsck resulted in an fsck time of something like
> three months. Provided your assumption of linear fsck time with file
> system parameters is valid, this would mean that 100x less files would
> result in an fsck time of 1 day. Unacceptable to many people. OR 1000
> times less files may take 2.4 hours. How do you explain your "not much"
> answer to your boss after he askes: "what did you do this morning?". "I
> turned on my workstation this morning and it decided to fsck the root
> partition, and the fsck took 2.4 hours. So this morning I mostly waited
> for my machine to boot... "

If it scaled linearly then I would expect 67 million inodes to take
about 1.5 hours given that checking an fs with 300k takes half a minute.
  Perhaps this bug should be reassigned to e2fsprogs and refactored to
focus just on the pathological slow down of fsck with extreme numbers of
inodes with many hard links.

You could also just disable the periodic fsck.

> Over the years people will have more and more files. This will continue
> to grow. I mentioned the fact that I was "beyond reasonable" back in
> 1987 with only 2200 files.

Maybe in another 10-20 years your average person might get that many,
but by then I'm sure we'll be using a different fs entirely. At present
this isn't anywhere close to being an issue.

> I adapt the boot stuff of my fileservers so that they will boot first,
> (i.e. start all services that depend on just the root) and then they
> start fscking the data partitions. When done they will mount them and
> restart services that depend on them (nfs).
>
> This allows me to monitor their progress remotely. Stuff like that. But
> also I can already use the services that they provide that are not
> dependent on the large partition with most of the files.
>
> If we have just one partition this is no longer possible: ALL services
> depend on "fsck of /" .

That is the kind of thing that requires manual planning; it can not be
set up like that by default.

> Another reason why splitting up the disk helps is that fsck is not
> linear with the disk space. Especially when fsck's working set is beyond
> the amount of RAM available.

That sounds like the problem. If you are swapping heavily that would
explain the pathological slow down.

> I work in data-recovery. Some of the recoveries are mostly images. As
> the number of pixels increases, the effects of jpeg compression
> increases as well. An image from a 12Mpixel camera is not twice as big
> as one from a 6 Mpixel camera. So, ten years from now when SLR cameras
> are 20-50 Mpixels, we'll have 4-8Mbytes of jpg image data. People will
> shoot and store many more images on their 20T drive. Now do t...

Read more...

Revision history for this message
rew (r-e-wolff) wrote :
Download full text (6.8 KiB)

On Mon, May 02, 2011 at 02:01:54PM -0000, Phillip Susi wrote:
> > Just the fact that /I/ happen to have 67 million files makes this bug
> > invalid in your eyes?

> No, it is the fact that this is a highly unusual configuration that
> Ted Tso seems to think is an abusive and unsupported use of the fs,
> and therefore, Ubuntu should not be trying to optimize its defaults
> for.

My configuration is maybe a bit extreme. I'm happy to have provided a
patch to e2fsck to have improved fsck performance from about 3 months
to "only a day". Ted however hasn't moved to integrate my patch.

> > My last "fight" with fsck resulted in an fsck time of something like
> > three months. Provided your assumption of linear fsck time with file
> > system parameters is valid, this would mean that 100x less files would
> > result in an fsck time of 1 day. Unacceptable to many people. OR 1000
> > times less files may take 2.4 hours. How do you explain your "not much"
> > answer to your boss after he askes: "what did you do this morning?". "I
> > turned on my workstation this morning and it decided to fsck the root
> > partition, and the fsck took 2.4 hours. So this morning I mostly waited
> > for my machine to boot... "

> If it scaled linearly then I would expect 67 million inodes to take
> about 1.5 hours given that checking an fs with 300k takes half a
> minute. Perhaps this bug should be reassigned to e2fsprogs and
> refactored to focus just on the pathological slow down of fsck with
> extreme numbers of inodes with many hard links.

I HAVE found the problem in fsck. And Fixed it. And provided the
patch.

Still it pays to think about the future. What if 1% of the ubuntu
users happen to use backuppc and end up with lots of files and hard
links like me? Bad luck to them!

> You could also just disable the periodic fsck.

You think it doesn't serve any purpose? Why not disable it for
everybody?

> > Over the years people will have more and more files. This will continue
> > to grow. I mentioned the fact that I was "beyond reasonable" back in
> > 1987 with only 2200 files.

> Maybe in another 10-20 years your average person might get that
> many, but by then I'm sure we'll be using a different fs entirely.
> At present this isn't anywhere close to being an issue.

And F.. bad lUCK to those who upgrade their ext3-4-5 filesystem over
the years and keep their data. And those that happen to install Ubuntu
as a backup-server using backuppc can go f... themselves.

> > I adapt the boot stuff of my fileservers so that they will boot first,
> > (i.e. start all services that depend on just the root) and then they
> > start fscking the data partitions. When done they will mount them and
> > restart services that depend on them (nfs).
> >
> > This allows me to monitor their progress remotely. Stuff like that. But
> > also I can already use the services that they provide that are not
> > dependent on the large partition with most of the files.
> >
> > If we have just one partition this is no longer possible: ALL services
> > depend on "fsck of /" .
>
> That is the kind of thing that requires manual planning; it can not be
> set up like that by default.

Right. But having...

Read more...

Revision history for this message
Phillip Susi (psusi) wrote :
Download full text (3.9 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/02/2011 12:29 PM, rew wrote:
> My configuration is maybe a bit extreme. I'm happy to have provided a
> patch to e2fsck to have improved fsck performance from about 3 months
> to "only a day". Ted however hasn't moved to integrate my patch.

Ahh, is there a bug filed for that with the patch attached? If e2fsck
can be improved to handle this kind of fs better then that seems worth
pursuing.

>> You could also just disable the periodic fsck.
>
> You think it doesn't serve any purpose? Why not disable it for
> everybody?

I've been advocating that for years, but Ted still thinks it is a good
idea and it seems that the Ubuntu devs don't want to deviate from
upstream on this.

> And F.. bad lUCK to those who upgrade their ext3-4-5 filesystem over
> the years and keep their data. And those that happen to install Ubuntu
> as a backup-server using backuppc can go f... themselves.

Or come up with creative ways to break their fs into smaller chunks,
which can not be prescribed as a default policy.

> Right. But having the system set up by default in a sensible way means
> it is possible to realize that this is necceary and adapt a system to
> this setup instead of requiring a full reinstall.

Huh? What is sensible? My guess is that what is sensible to you is not
for most others. Also you do not have to do a full reinstall to break
up the fs into smaller chunks if you realize a certain area is amassing
a huge collection of files.

> All this would have put someone with one big partition in big trouble
> with having to install new hardware for the partition.

Sure, but again, this is not a common occurrence and can't really be
addressed by default. What parts of the fs do you split off? Is it
/home that is the problem? Or /var/mail? Or /var/www? It depends.
For most users the problem is not having tens of millions of inodes
slowing down fsck, but simply having the hd divided into different
partitions, one of which can run out of space while the others have plenty.

> And if you have one of those applications and run backuppc and/or have
> a server that backs up a whole bunch of workstations you'll end up
> with lots of files like me.

It seems that at least Ted feels that this sort of thing is better done
with tar or dump or something instead of copying all files from each
machine and then repeatedly and deeply hard linking them, or to use an
fs that is designed to be able to do that sort of thing well, like
btrfs. Hopefully if/when this becomes popular, btrfs will be positioned
to handle it well.

Continuing to push the envelope and either enhance fsck or design a new
fs like btrfs are worthy endeavors, but at this time, there does not
seem to be a sensible change to the default policy that would benefit
more people than it would harm.

I am a bit curious though, about what you think of an idea that has been
brewing in the back of my mind for some time. That is to make use of
LVM in such a way that a default install can at least create a separate
/home volume, but leave most of the space unallocated at install time,
then have a manager program notice when a volume is close to full and
aut...

Read more...

Revision history for this message
rew (r-e-wolff) wrote :
Download full text (4.8 KiB)

On Tue, May 03, 2011 at 12:44:06AM -0000, Phillip Susi wrote:
> On 05/02/2011 12:29 PM, rew wrote:
> > My configuration is maybe a bit extreme. I'm happy to have provided a
> > patch to e2fsck to have improved fsck performance from about 3 months
> > to "only a day". Ted however hasn't moved to integrate my patch.
>
> Ahh, is there a bug filed for that with the patch attached? If e2fsck
> can be improved to handle this kind of fs better then that seems worth
> pursuing.

The patch was posted to the mailing list.

> >> You could also just disable the periodic fsck.
> >
> > You think it doesn't serve any purpose? Why not disable it for
> > everybody?
>
> I've been advocating that for years, but Ted still thinks it is a good
> idea and it seems that the Ubuntu devs don't want to deviate from
> upstream on this.

Well... My backup partition "sprung a leak" (i.e. a corruption) after
some time. My machines have average uptimes that exceed the
fsck-interval by a factor of two: I didn't have a powerfailure cause
the corruption.

> Huh? What is sensible? My guess is that what is sensible to you is
> not for most others. Also you do not have to do a full reinstall to
> break up the fs into smaller chunks if you realize a certain area is
> amassing a huge collection of files.

I don't feel comfortable shrinking existing filesystems. But
yeah, Maybe that's possible.

> > All this would have put someone with one big partition in big trouble
> > with having to install new hardware for the partition.
>
> Sure, but again, this is not a common occurrence and can't really be
> addressed by default. What parts of the fs do you split off? Is it
> /home that is the problem? Or /var/mail? Or /var/www? It depends.
> For most users the problem is not having tens of millions of inodes
> slowing down fsck, but simply having the hd divided into different
> partitions, one of which can run out of space while the others have plenty.

Well, there are several parts that are special.

First there is the root. I'm currently considering upgrading my home
workstation from Jaunty to Natty. I had the sense to partition my
drive to have several "sized-for-a-root-partition" partitions. So
instead of having to upgrade my root I can reinstall on one of the
other partitions and decide to switch when I'm satisified it works. (I
could upgrade the existing jaunty risking unable to go back if
something doesn't work well enough to be workable, or reinstalling
over the current install risking losing configuration information I
spent hours on generating).

Secondly there is "/home" which consists entirely of "to be preserved
in case of upgrade" user-data. Having that in a separate partition
helps.

If my /var/www grows out of the /root partition, I simply move it to
/home/www and symlink: ln -s /home/www /var/www .

So, I would argue that / and /home should be on separate partitions,
while all the others I don't care too much about.

> > And if you have one of those applications and run backuppc and/or have
> > a server that backs up a whole bunch of workstations you'll end up
> > with lots of files like me.
>
> It seems that at least Ted feels that this sort of thing is b...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.