Jaunty: Buffer I/O error on casper-rw persistence partition

Bug #371477 reported by Alex Roper
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
casper (Ubuntu)
New
Undecided
Unassigned

Bug Description

Binary package hint: casper

Hi,

I created a Jaunty liveboot off a USB key using Unetbootin. After repartitioning and creating a casper-rw partition later on the drive (5.0 GB), I booted it, installed a bunch of things, and shut down cleanly. A bunch of buffer I/O errors were dumped to the console after I hit ENTER at the "remove media" prompt (I don't know if they were already there on a different vt and I just didn't see them)

Fsck revealed many errors on the device.

I repeated this again, after erasing the persistence partition, and this time manually synced after installing all my fun stuff before shutting down. Same error. Very similar fsck output in fact. This time I booted to it and don't notice any obvious stability issues.

I play with persistence quite a bit on Ubuntu and haven't run into this before, but I last used 8.04 and was just making a new one.

Not sure just want other info to include in this, please let me know and I'll see if I can get it for you. If I had to take a stab I'd guess something about not remounting r/o the unionfs and/or backing persistence before shutdown, but I'm no expert in these things. The persistence partition is ext2, I can repeat with a journaled fs of your choice if that would make this easier to debug.

---

ubuntu@ubuntu:~$ lsb_release -rd
Description: Ubuntu 9.04
Release: 9.04

Revision history for this message
Alex Roper (alexr-ugcs) wrote :

Just realized that the buffer I/O errors are dumped /after/ I hit ENTER, not before it, and this occurs even if I made no substantial changes to the persistence partition.

So essentially if the user just leaves it in because they want to reboot, they get slammed with persistence corruption. If I just remove the drive when it tells me to, ubuntu shutdown is unable to corrupt it.

It would be nice not to corrupt it, but at least this means I don't need this fixed to advance my embedded systems project:-)

Revision history for this message
Alex Roper (alexr-ugcs) wrote :

fsck log. I got almost these exact same errors (down to inode number) all three times I tried this: whether or not I sync, whether or not I make substantial changes. I can provide a complete dd of the device if you want it, or just the e2fs debug data.

alexr@autumn:~$ sudo fsck.ext2 /dev/sdb2
e2fsck 1.41.3 (12-Oct-2008)
casper-rw was not cleanly unmounted, check forced.
Pass 1: Checking inodes, blocks, and sizes
Deleted inode 15 has zero dtime. Fix<y>? yes

Deleted inode 16 has zero dtime. Fix<y>? yes

Deleted inode 17 has zero dtime. Fix<y>? yes

Deleted inode 2682 has zero dtime. Fix<y>? yes

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -22528 -(22536--22569) -(22576--22628) -(22631--22638) -(22651--22659) -(22667--22668)
Fix<y>? yes

Free blocks count wrong for group #0 (19632, counted=19739).
Fix<y>? yes

Free blocks count wrong (1371107, counted=1371214).
Fix<y>? yes

Inode bitmap differences: -(15--17) -2682
Fix<y>? yes

Free inodes count wrong for group #0 (5495, counted=5499).
Fix<y>? yes

Free inodes count wrong (357336, counted=357340).
Fix<y>? yes

casper-rw: ***** FILE SYSTEM WAS MODIFIED *****
casper-rw: 10580/367920 files (0.4% non-contiguous), 98226/1469440 blocks

Revision history for this message
Alex Roper (alexr-ugcs) wrote :

Scratch that, the exact same set of blocks is corrupted in the same way even if I follow Ubuntu's instruction to remove the disk. Looks like I'm blocking on this bug. Hope it works out:-)

Revision history for this message
Aleksey Makarenko (magalex28) wrote :

Collided with the same problem. First, on Ubuntu 8.10. Hoped to fix in the Ubuntu 9.04 ... But no.
The problem arises not only from the ext2 partition, but with a file named casper-rw.
Alex Roper, did you read this thread? https: / / bugs.launchpad.net / ubuntu / + source / casper / + bug/125702
Maybe there you will find the answer? I am novice in Linux, so virtually nothing is understood ...
Sorry for bad English, I use online translator...

Revision history for this message
Alex Roper (alexr-ugcs) wrote :

I have a swap partition. Swapoff before shutdown does not fix the problem. Can we make anything out of the fact that the output of fsck is the same across different-sized partitions, filesystem contents, and usb keys?

Revision history for this message
Aleksey Makarenko (magalex28) wrote :

   I think we should start from the fact that the "casper-rw was not cleanly UNMOUNTED". If the casper-rw in general has been unmounted... I am not an expert, but I think, that we have a bug in the scripts(?), responsible for starting and stopping system ...
   Here is my e2fsck output:

root@PartedMagic:~# e2fsck /dev/sdb2
e2fsck 1.41.4 (27-Jan-2009)
casper-rw was not cleanly unmounted, check forced.
Pass 1: Checking inodes, blocks, and sizes
Deleted inode 15 has zero dtime. Fix<y>? yes

Deleted inode 16 has zero dtime. Fix<y>? yes

Deleted inode 17 has zero dtime. Fix<y>? yes

Deleted inode 18 has zero dtime. Fix<y>? yes

Deleted inode 19 has zero dtime. Fix<y>? yes

Deleted inode 20 has zero dtime. Fix<y>? yes

Deleted inode 21 has zero dtime. Fix<y>? yes

Deleted inode 22 has zero dtime. Fix<y>? yes

Deleted inode 23 has zero dtime. Fix<y>? yes

Deleted inode 24 has zero dtime. Fix<y>? yes

Deleted inode 25 has zero dtime. Fix<y>? yes

Deleted inode 26 has zero dtime. Fix<y>? yes

Deleted inode 27 has zero dtime. Fix<y>? yes

Deleted inode 28 has zero dtime. Fix<y>? yes

Deleted inode 29 has zero dtime. Fix<y>? yes

Deleted inode 30 has zero dtime. Fix<y>? yes

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -(678--679) -(686--873) -(876--878) -(880--892) -(896--904) -920 -20480 -(20488--20499) -(20504--20576) -(20578--20583) -20586 -(20597--20603) -(20605--20617) -(30720--30723) -(30728--30739) -(30744--30940) -(30959--30972) -30975 -30977 -(30980--30981)
Fix<y>? yes

Free blocks count wrong for group #0 (31530, counted=32090).
Fix<y>? yes

Free blocks count wrong (680208, counted=680768).
Fix<y>? yes

Inode bitmap differences: -(15--30)
Fix<y>? yes

Free inodes count wrong for group #0 (7843, counted=7859).
Fix<y>? yes

Free inodes count wrong (179668, counted=179684).
Fix<y>? yes

casper-rw: ***** FILE SYSTEM WAS MODIFIED *****
casper-rw: 1372/181056 files (7.4% non-contiguous), 42157/722925 blocks
root@PartedMagic:~#

Revision history for this message
Alex Roper (alexr-ugcs) wrote :

So here's the issue I'm having. I tried messing with rc0.d/S89casper to sync the filesystem right before and after the HIT ENTER prompt, but this didn't help. Nor did attempting to remounting root read only.

My intuition here is that since ubuntu hides the original casper-rw mount point outside our chrooted/pivoted environment, it is interfering. I should add that I don't have that much experience with such things, though I did make Debian boot from a cryptroot well before it was an option in the installer.

If we could somehow break out of the chroot to remount the ext2 partition read only that might do it, but I just have no clue how to do that or even if it's possible.

It is somewhat troubling to me that this continues to occur even if I put casper-rw on a DIFFERENT usb key. As far as I can tell there is no way to make a persistent partition WITHOUT corruption issues. Once we get persistence working on separate devices we can figure out how to get it on the same device.

In the mean time, I'll just make the necessary changes to my system in the cramfs, or just change the init scripts to boot the persistence partition read only for my project, and preload it with all the stuff I need. I really don't have time to debug this now, I don't graduate if I'm not done by May 11 and I can't afford another year.

I may revisit this issue later when I again have time. I will still try to respond to questions and to give aid reproducing the issue if anyone else wants to work on this.

Disclaimer: I may be wrong about all of the above. I know my way around a linux system but am no ubuntu specialist.

Revision history for this message
Steve Dodd (anarchetic) wrote :

Hi, this looks very much like a duplicate of bug #125702.

ext3 rather than ext2 seems to mask the problem pretty effectively, but as ext3 is probably less flash-friendly (extra writes for the journal, and in the same place which may wear a hole in anything that's not doing wear-levelling) it would be nice to see this fixed.

I've got a patch in bug #125702 that seems to sort this out properly: one file in the initrd needs changing to expose the cow fs , and also /etc/init.d/casper (in the main fs) to mount the cow fs r/o (umounting it completely doesn't work, which I guess is fair enough.)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.