Unable to install using ext4

Bug #512002 reported by Jamin W. Collins
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Binary package hint: ubiquity

I've tried both the 9.10 Desktop and 9.10 Alternative 32-bit installation CDs. In both cases the installation failed while installing the base file system with what appeared to be a kernel oops referencing the ext4 file system and errors with a journal commit.

Manually specifying that the partitions use ext3 has resulted in being able to complete the installation and a fully functional system.

Memtest has been run on the system, completing 15 passes without any error reported.

ProblemType: Bug
Architecture: i386
Date: Sun Jan 24 10:42:14 2010
DistroRelease: Ubuntu 9.10
Package: ubiquity (not installed)
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-17.54-generic
SourcePackage: ubiquity
Uname: Linux 2.6.31-17-generic i686
XsessionErrors:
 (gnome-settings-daemon:1432): GLib-CRITICAL **: g_propagate_error: assertion `src != NULL' failed
 (gnome-settings-daemon:1432): GLib-CRITICAL **: g_propagate_error: assertion `src != NULL' failed
 (polkit-gnome-authentication-agent-1:1499): GLib-CRITICAL **: g_once_init_leave: assertion `initialization_value != 0' failed
 (nautilus:1491): Eel-CRITICAL **: eel_preferences_get_boolean: assertion `preferences_is_initialized ()' failed

Revision history for this message
Jakob Unterwurzacher (jakobunt) wrote :

Do you have a screenshot or something of the kernel oops? Is the disk okay? (smarctl -a /dev/sda)

Revision history for this message
Jamin W. Collins (jcollins) wrote :

The drive does have a number of sections listed as Old_age, which is understandable because it is a very old drive, along with a few sections listed as Pre-fail. However, the overall smartctl report is a pass.

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

I've attached the testing output. I'll see about doing another installation and try to capture the error being reported.

Revision history for this message
Jamin W. Collins (jcollins) wrote :
Revision history for this message
Jamin W. Collins (jcollins) wrote :
Revision history for this message
Jamin W. Collins (jcollins) wrote :
Revision history for this message
Jamin W. Collins (jcollins) wrote :
Revision history for this message
Jamin W. Collins (jcollins) wrote :

As you can see from the attached screenshots, when the error happens isn't consistent, but it does appear to always happen when trying to use ext4. I've yet to experience the problem using ext3.

I was not able to capture a dmesg output from the first install as connecting the USB stick to the system caused it to lock up. For installs 2 and 3 I started with the USB stick already connected.

Revision history for this message
Monkey (monkey-libre) wrote :

Please give the next (attachment) information, I´ve added a link for that. Thank You for making Ubuntu better.

https://wiki.ubuntu.com/DebuggingUbiquity/AttachingLogs

tags: added: iso-testing
Changed in ubiquity (Ubuntu):
status: New → Incomplete
Revision history for this message
Jamin W. Collins (jcollins) wrote :

From what I can see on the referenced page, it's mainly for Ubiquity crashing. This is not the case, Ubiquity is not crashing, it's erroring and then exiting without a crash, but also without a viable installation. The crash that is being detected is redirected to #452208, but that report indicates it's a non-critical crash and the installation should complete. My installation does not complete.

Attaching the files anyway.

Revision history for this message
Jamin W. Collins (jcollins) wrote :
Revision history for this message
Jamin W. Collins (jcollins) wrote :
Revision history for this message
Jakob Unterwurzacher (jakobunt) wrote :

From dmesg-2:
[ 535.243216] end_request: I/O error, dev sda, sector 38039071
[ 535.248144] Aborting journal on device sda1:8.
[ 535.265712] EXT4-fs error (device sda1): ext4_journal_start_sb: Detected aborted journal
[ 535.265727] EXT4-fs (sda1): Remounting filesystem read-only

dmesg-3:
[ 997.211783] end_request: I/O error, dev sda, sector 38032023

Looks like the drive has some trouble starting sector 38032023 despite smart saying it has 0 bad sectors.
That ext3 is not affected may be just luck. ext3 does no happen to have important data there.

You could use
badblocks /dev/sda1
to verify that it's bad blocks that are causing trouble.

The ubuntu installer does not check for bad blocks when formattiing. If you want to use this drive, i think you will have to manually format (mkfs.ext4 -cc /dev/sda1) and then tell the installer to use the existing file system.

Revision history for this message
Jamin W. Collins (jcollins) wrote :

While I see your logic, the command did not report any errors on the drive.

Revision history for this message
Narcis Garcia (narcisgarcia) wrote :

I left my old bug:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/503848
to subscribe this more exact bug.

I think the package affected is partman-ext3 ("Add to partman support for ext3 and ext4").
I can select ext3 format for normal partitions (it's a solution for traditional installations), but is not possible to make a manual partitioning, create an encrypted volume (because of a multisystem hard disk), and when configuring that encrypted volume change its default ext4 format.

Steps to reproduce:
1. Make a boot CD with ubuntu-9.10-alternate-i386.iso
2. Boot, select language and "install option", keyboard and time zone
3. Give a name for the system
4. Choose the "manual" partitioning method
5. In the unpartitioned space, create a primary partition and mark it to use as "physical volume for encryption"
6. Select the "Configure encrypted volumes" option, then choose the new partition, accept defaults, and write the password as required
7. Select the new appeared encrypted volume on top of the list.
8. You see that is selected to have an ext4 format by default; when you select this item to change, partman crashes and tries to restart with a "Starting up the partitioner" message that never completes, and you never return to partitioner/installer in this session (reboot needed, without solution).

Seeing that the Desktop installer doesn't give option to make manual partitioning with encryption, and the Debian Installer selects ext4 filesystem by default in manual partitioning (and crashes if you want to change the use):
KARMIC DOESN'T SEEM TO BE POSSIBLE TO BE INSTALLED ON AN ENCRYPTED VOLUME WITHOUT DELETING ALL THE DISK.

Revision history for this message
Jakob Unterwurzacher (jakobunt) wrote :

Actiu, i don't think this is related to this problem/bugreport. We don't have LVM nor encryption here.

Jamin, do you have any data on this drive? The thing is,
badblocks /dev/sda1
does a read-only test. The journal error seems to be happening on write.

Could you try (boot from the Live CD)
badblocks -nvs /dev/sda1
(non-destructive read-write)

Or even
badblocks -wvs /dev/sda1
(destructive read-write, THIS WILL ERASE THE WHOLE PARTITON. And will take a long time.)

(if you don't care about data loss and want quick results, try this
dd if=/dev/zero of=/dev/sda bs=512 count=10000 seek=38032000
dd if=/dev/sda of=/dev/null bs=512 count=10000 skip=38032000
THIS WILL ERASE RANDOM DATA ON THE PARTITION)

Revision history for this message
Jamin W. Collins (jcollins) wrote :

@Jakob
No data on the drive, replicating this issue means allowing it to be utterly erased as the installer erases and automatically partitions the entire drive.

Currently I have it going through the destructive read-write you referenced. However, as you can see from the initial output it's not having any problems with that. I'll post the final output once it's completed.

$ sudo badblocks -wvs /dev/sda1
Checking for bad blocks in read-write mode
From block 0 to 38331057
Testing with pattern 0xaa: done
Reading and comparing: done

Revision history for this message
Narcis Garcia (narcisgarcia) wrote :

I've found the problem with LVM and Encryption because I've tried the alternate installation with these configurations.
The problem I've found (in 3 different computers) can be reproduced with:
- Xubuntu 9.10 alternate-CD i386
- Ubuntu 9.10 alternate-CD i386
- Ubuntu 10.04 (in development) Server i386

All other installations I've made (a lot), were successful with ext4 using the Desktop CD installer.

Revision history for this message
Jamin W. Collins (jcollins) wrote :

Test completed with no bad blocks found:

$ sudo badblocks -wvs /dev/sda1
Checking for bad blocks in read-write mode
From block 0 to 38331057
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: done
Testing with pattern 0x00: done
Reading and comparing: done
Pass completed, 0 bad blocks found.

Revision history for this message
Jakob Unterwurzacher (jakobunt) wrote :

@Actiu: Do you get an error message like this? http://launchpadlibrarian.net/38337359/install3.jpg - If not then I think you are seeing a different problem. I have marked your bug 503848 as not-a-duplicate to prevent confusion.

@Jamin: Hmm, just as the SMART data said, the drive has no bad sectors. (Side note: Old_Age just means that this value will be high when the drive is old, and Pre-fail means that this value will increase when the drive is failing.)

But this error (dmesg-install2.log) suggests that either the drive or the controller must be doing something wrong.
[ 525.000074] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 525.000092] ata1.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[ 525.000094] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 525.000098] ata1.00: status: { DRDY }
[ 530.004017] ata1: link is slow to respond, please be patient (ready=0)
[ 535.040086] ata1: device not ready (errno=-16), forcing hardreset
[ 535.040101] ata1: soft resetting link
[ 535.224838] ata1.00: configured for UDMA/66
[ 535.224852] ata1.00: device reported invalid CHS sector 0
[ 535.224894] ata1: EH complete
[ 535.243216] end_request: I/O error, dev sda, sector 38039071

I don't get why ext3 does not trigger it, but this error is independent of the filesystem. Faulty hardware of a ATA driver bug?

Revision history for this message
Jamin W. Collins (jcollins) wrote :

I would agree save that my experience indicates otherwise. If I install the system as ext3, I can use it without issue even stressing both the system and the drive by running VMs under KVM. All without issue. If I can install the base system, apply all updates, then fully allocate an 8 gig disk image to use with KVM, and then run a VM against the disk image on the very same hardware that fails to even complete an ext4 base system installation I'd say it's pretty safe to say that the hardware is operating properly. Simply changing the target file system to ext4 causes things to fail.

Revision history for this message
Jamin W. Collins (jcollins) wrote :

Additionally, with the badblocks operation you've seen 4 complete writes and reads of every location that good possibly be used by ext3 or 4 on the drive (ie all of sda1). Would executing something like the following help rule out the controller and drive for you?

dmesg > dmesg-before.log
for i in $(seq 1 10); do
    dd if=/dev/urandom of=/dev/sda bs=512
    dd if=/dev/sda of=/dev/null bs=512
done
dmesg > dmesg-after.log

I'd think if this were a controller or drive issue, 10 full iterations over the entire drive would cause the issue to surface, no?

Revision history for this message
Narcis Garcia (narcisgarcia) wrote :

Jakob Unterwurzacher, I don't know how to see the error message, because partitioner crashes and restarts inmediately. But you are showing a message in another phase than mine, ok.

Revision history for this message
Jakob Unterwurzacher (jakobunt) wrote :

Jamin, i have no explaination why this does not happen with ext3 (or badblocks, or dd). On the other hand i cannot see how ext4 could trigger ATA errors without some kind of hardware problem.

Revision history for this message
Jamin W. Collins (jcollins) wrote :

I've just experienced what appears to be the same type of issue trying to install the 9.10 64-bit server edition inside a KVM virtual machine that is using an LVM volume for its storage. Every attempt to install using EXT4 would fail with I/O errors and eventually the /target volume being remounted read-only. The host system shows no issue with the LVM volume and several other VMs are running happily. Simply changing the partition format from EXT4 to EXT3 solved the issue and the VM completed its installation without issue.

Revision history for this message
Colin Watson (cjwatson) wrote :

I'm not sure there's much ubiquity can do here; reassigning over to the kernel since, if it verifiably isn't a hardware problem, then the kernel is about the only other piece that could be broken ...

affects: ubiquity (Ubuntu) → linux (Ubuntu)
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu development release http://cdimage.ubuntu.com/daily-live/current/ . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.