Comment 20 for bug 543617

On Tue, Apr 13, 2010 at 09:13:49PM -0000, Phillip Susi wrote:
> On 4/13/2010 4:30 PM, Launchpad Bug Tracker wrote:
> > * SAUCE: sync before umount to reduce time taken by ext4 umount
> > - LP: #543617
> This sounds more like a temporary workaround than a fix of the real bug.
> Is that the case and why? Just can't find the real problem, or it will
> take too long to fix?

I recommended doing a sync in userspace (i.e., in various shutdown
scripts and GNOME/KDE desktops) as a temporary workaround because I
didn't have time to poke at this before the Lucid release deadlines
(which is coming quite rapidly, yes). I guess the Ubuntu kernel team
decided it was easier drop a forced sync into the kernel. I haven't
examined the patch that they ultimately chose, but presumably it's low
risk to be inserted less than two weeks before the final release date
of Lucid if it was coded correctly. Me, I'd probably would have stuck
the sync in userspace, but I'm super paranoid this close to a
"enterprise-quality" release date, which is what the Lucid LTS release
purports to be.

As far as "trying to find the real problem", if Ubuntu was paying my
salary I'd give it more time to find the root cause of this bug, but
this is a low priority bug given other things on my plate. Red Hat
employs several very high powered file system developers, so they fix
a lot more of their own distro-specific bugs. Interestingly, this is
something that hasn't shown up as a complaint on Fedora systems. I'm
not sure why; the test case Kees provided shows that this is
definitely an upstream problem, but apparently something about their
choice of desktop components or how they are configured or something
about their init/hal scripts means that it's not showing up for their
users in practice for some reason.

My problem is I'm incredibly and busy at the moment, and I've already
done Ubuntu a huge favor by spending ten minutes to do a quickie
investigation. Ubuntu needs to learn that it can't rely on upstream
developers to jump through flaming hoops on short notice before a LTS
release deadline as a cost-saving mechanism to avoid hiring their own
senior kernel engineers. So hiring Surbhi is definitely a step in the
right direction. (One step on a journey of ten thousand, but a step
in the right direction nonetheless. :-)

Surbhi will eventually have the experience of folks like Eric Sandeen
and Josef Bacik, or Jan Kara at SuSE, and eventually hopefully she'll
be able to fix bugs like this quickly. Someone who is an ext4 expert
probably could localize this down in less than a day, especially given
my "ten minute investigation" to point them in the right direction.
The fact that "sync" on the command line causes the right thing to
happen, and "umount" with dirty inodes extant, doesn't, is a pretty
strong hint of where to look, and no, the root cause is probably not
the jbd2 layer as Surbhi has suggested.

      - Ted

P.S. Next thing for Ubuntu to learn --- how to pay their engineers
well enough, and how to give them enough time to work on upstream
issues, that once they gain that experience on Ubuntu's dime and
become well known in the open source community, they don't end jumping
ship to companies like Red Hat or Google. :-)

On the other hand, if Ubuntu management doesn't learn, that's also OK.
Google is hiring. :-)