Ubuntu
e2fsprogs package

Bug #556621
Comment #19

Comment 19 for bug 556621

Revision history for this message

Theodore Ts'o (tytso) wrote on 2010-06-14:

#19

>Couldn't/shouldn't systems with broken real time clocks be fixed to
>force the system clock up to mkfs time before mounting the root fs, and
>wouldn't that take care of that and other problems?

Unfortunately, it's not so simple. What about people running old systems, like Ubuntu LTS and then upgrade to a newer e2fsprogs? What if someone else from Ubuntu doesn't realize about this dependency and releases a version of e2fsprogs with this change?

More seriously, what if there is more than one filesystem? What about an USB storage device containing an extN filesystem which is hotplugged in after system boot?

What about someone running an Live CD system with a crappy clock far in the future, and who then formats the filesystem? What if they install a system using an Ubuntu installation CD with the date far into the future?

I agree with you that it would be highly convenient if we could count on the system clock. I come from the Unix world, where we could, and it makes life much easer. In the good old days, Multics simply wouldn't allow the system to come up at all if it would result in time going backwards, and then they utilized this property in all sorts of really cool ways. (Heh, Multics could even allow you to detect a file system corruption, and then repair a live mounted filesystem with that error, all while the file system was mounted. It could even survive a third of its memory suddenly disappearing, and it would only kill the processes which had memory disappear after the janitor plugged the floor waxer into the wrong circuit, blowing an electrical breaker and disabling one of the cabinets containing some of the system's memory.)

Unfortunately, we don't live in that world. We live in the world of clueless users, people who want to dual boot Windows, crap hardware with CMOS crystals that are off by plus or minus 20%, so simply keeping crapola tablet or embedded device turned off while it is shipped from Taiwan to the US in a container ship will cause the clock to be at some random time. We live in the world of crappy virtualization manager software which sets the CMOS time from the Unix system time and then doesn't bother to do the time zone virtualization. (I'm looking at you, Ubuntu; the bug was filed in Launchpad a while back IIRC --- no, not against the virtualization manager, but in e2fsprogs; it's always e2fsprogs fault when something goes wrong because people can't deal with system clock bugs.) And of course, the clock could be bad in the factory in Taiwan where the system is installed.

And because of this, the sorts of problems are legion. The hueristic I proposed won't handle the case where the s_mkfs time is set into the future because the clock was bad at installer time, and then the system boots, and then the time gets warped back to the correct time by NTP, and the hardware clock is set correctly, and on the next reboot, e2fsck with your proposed patch goes wild and started deleting inodes as "belonging to the previous filesystem format".

>I'd rather avoid the need to zero the table completely since that has
>negative consequences other than just using up disk IO in the
>background. For instance, if the fs is on a snapshot, thin provisioned
>san disk, or SSD, the writes cause allocations that aren't needed just
>to hold zeroes, which reads there would already return.

First of all, it's no worse than it is today given how mke2fs works, which zeros the inode table in the foreground. Secondly, I'd much rather have correct behaviour than lose some users data just to optimize mke2fs time. The snapshot case only applies if you are making a snapshot in the first 15 minutes after the system is installed; after that the inode table will have been zero'ed so this isn't a big deal. As far as the SSD is concerned, *if* we know it is an SSD reliably (perhaps because it doesn't take long to read in the a few block groups worth of inode table blocks), then we could also simply verify that the inode table blocks have been cleared after issuing the BLKDISCARD ioctl, and if so, we don't need to zero the blocks.

The other long-term solution to this problem is to add checksum fields to protect the inode structures, which will have in the inode structure the inode number and an FS-unique random seed included in the checksum. This requires a read-only incompat file system feature flag, which means older kernels won't be allowed to modify such file systems (since otherwise they will break the checksum), so it's something that would have to be phased in. But it's the right long-term solution.

The bottom line is that given today's crappy hardware, and machines which can't be counted upon to be maintained by professional system administrators, trying to depend on the clock is just going to get you in trouble. Ubuntu has taught me this less all too well....

>Couldn't/shouldn't systems with broken real time clocks be fixed to
>force the system clock up to mkfs time before mounting the root fs, and
>wouldn't that take care of that and other problems?

Unfortunately, it's not so simple.  What about people running old systems, like Ubuntu LTS and then upgrade to a newer e2fsprogs?   What if someone else from Ubuntu doesn't realize about this dependency and releases a version of e2fsprogs with this change?

More seriously, what if there is more than one filesystem?  What about an USB storage device containing an extN filesystem which is hotplugged in after system boot?

What about someone running an Live CD system with a crappy clock far in the future, and who then formats the filesystem?  What if they install a system using an Ubuntu installation CD with the date far into the future?

I agree with you that it would be highly convenient if we could count on the system clock.   I come from the Unix world, where we could, and it makes life much easer.   In the good old days, Multics simply wouldn't allow the system to come up at all if it would result in time going backwards, and then they utilized this property in all sorts of really cool ways.  (Heh, Multics could even allow you to detect a file system corruption, and then repair a live mounted filesystem with that error, all while the file system was mounted.  It could even survive a third of its memory suddenly disappearing, and it would only kill the processes which had  memory disappear after the janitor plugged the floor waxer into the wrong circuit, blowing an electrical breaker and disabling one of the cabinets containing some of the system's memory.)

Unfortunately, we don't live in that world.   We live in the world of clueless users, people who want to dual boot Windows, crap hardware with CMOS crystals that are off by plus or minus 20%, so simply keeping crapola tablet or embedded device turned off while it is shipped from Taiwan to the US in a container ship will cause the clock to be at some random time.  We live in the world of crappy virtualization manager software which sets the CMOS time from the Unix system time and then doesn't bother to do the time zone virtualization.   (I'm looking at you, Ubuntu; the bug was filed in Launchpad a while back IIRC --- no, not against the virtualization manager, but in e2fsprogs; it's always e2fsprogs fault when something goes wrong because people can't deal with system clock bugs.)  And of course, the clock could be bad in the factory in Taiwan where the system is installed.

And because of this, the sorts of problems are legion.  The hueristic I proposed won't handle the case where the s_mkfs time is set into the future because the clock was bad at installer time, and then the system boots, and then the time gets warped back to the correct time by NTP, and the hardware clock is set correctly, and on the next reboot, e2fsck with your proposed patch goes wild and started deleting inodes as "belonging to the previous filesystem format".

First of all, it's no worse than it is today given how mke2fs works, which zeros the inode table in the foreground.  Secondly, I'd much rather have correct behaviour than lose some users data just to optimize mke2fs time.   The snapshot case only applies if you are making a snapshot in the first 15 minutes after the system is installed; after that the inode table will have been zero'ed so this isn't a big deal.   As far as the SSD is concerned, *if* we know it is an SSD reliably (perhaps because it doesn't take long to read in the a few block groups worth of inode table blocks), then we could also simply verify that the inode table blocks have been cleared after issuing the BLKDISCARD ioctl, and if so, we don't need to zero the blocks.

The other long-term solution to this problem is to add checksum fields to protect the inode structures, which will have in the inode structure the inode number and an FS-unique random seed included in the checksum.   This requires a read-only incompat file system feature flag, which means older kernels won't be allowed to modify such file systems (since otherwise they will break the checksum), so it's something that would have to be phased in.  But it's the right long-term solution.

Ubuntue2fsprogs package

Comment 19 for bug 556621

Ubuntu
e2fsprogs package