Installation randomly fails with: File "/usr/lib/ubiquity/ubiquity/install_misc.py", line 621, in copy_file targetfh.write(buf) IOError: [Errno 22] Invalid argument

Bug #894768 reported by Jean-Baptiste Lallement on 2011-11-25
596
This bug affects 74 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Andy Whitcroft
Precise
High
Andy Whitcroft

Bug Description

Precise desktop images installation fails regularly with the following error:

install.py: Exception during installation:
install.py: Traceback (most recent call last):
install.py: File "/usr/share/ubiquity/install.py", line 656, in <module>
install.py: install.run()
install.py: File "/usr/share/ubiquity/install.py", line 130, in run
install.py: self.copy_all()
install.py: File "/usr/share/ubiquity/install.py", line 423, in copy_all
install.py: install_misc.copy_file(self.db, sourcepath, targetpath, md5_check)
install.py: File "/usr/lib/ubiquity/ubiquity/install_misc.py", line 621, in copy_file
install.py: targetfh.write(buf)
install.py: IOError: [Errno 22] Invalid argument
install.py:

This happens during automated installations but also during a manual installation. All the tests have been run on VMs. I haven't found any pattern to reproduce this crash and it affects i386 and amd64.

The host systems runs Oneiric amd64 with kernel 3.0.0-12

Running the exact same test again pass most of the time.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: ubiquity (not installed)
ProcVersionSignature: Ubuntu 3.2.0-1.3-generic 3.2.0-rc2
Uname: Linux 3.2.0-1-generic x86_64
ApportVersion: 1.90-0ubuntu1
Architecture: amd64
Date: Fri Nov 25 15:01:02 2011
EcryptfsInUse: Yes
InstallationMedia: Ubuntu 11.10 "Oneiric Ocelot" - Release amd64 (20111012)
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.utf8
 SHELL=/bin/bash
SourcePackage: ubiquity
UpgradeStatus: No upgrade log present (probably fresh install)

Jean-Baptiste Lallement (jibel) wrote :
description: updated
Jean-Baptiste Lallement (jibel) wrote :
  • dm Edit (6.4 KiB, text/plain)
Jean-Baptiste Lallement (jibel) wrote :
Jean-Baptiste Lallement (jibel) wrote :
Martin Pitt (pitti) wrote :

For the record, Jean-Baptiste says that there are no sda/xda related errors in dmesg.

Jean-Baptiste Lallement (jibel) wrote :

bug 896546 also fails with 'Invalid argument' but in targetfh.close() while here is failed in targetfh.write(buf)

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ubiquity (Ubuntu):
status: New → Confirmed
Jean-Baptiste Lallement (jibel) wrote :

I ran 20 installation of an i386 image and 50% of the tests failed with this error.
I ran the same test with an Oneiric image and none of the test failed.

The duplicate shows that it happens on another environment, and I also got this error once during an alternate installation. I'm moving this report to the kernel

affects: ubiquity (Ubuntu) → linux (Ubuntu)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-request-3.2.0-2.4
description: updated
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: bot-stop-nagging ubiquity-2.9.4
removed: ubiquity-2.8.7
Martin Pitt (pitti) wrote :

I ran two installations of today's precise image in a row under a current precise host with precise kernel, then two more under precise host running a lucid kernel, and all four succeeded. In the VM (under the live system) I also tried to copy /bin/ to /target 10.000 times in all four cases which also did not trigger the bug. But I guess I've just been lucky here, or something else in my environment is different. In all cases I was using virtio for the VM hard disk.

Does this also happen when not using virtio? If not, this might provide a workaround for the time being.

more testing shows that it affects essentially i386 VMs running on an amd64 host (~50% of a runs fails in this configuration while it failed only once with an amd64 guest)

I tried virtio and hda and it doesn't make a difference.

Changed in linux (Ubuntu Precise):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → High
tags: added: kernel-da-key kernel-key
Changed in linux (Ubuntu Precise):
status: Confirmed → Triaged
status: Triaged → Confirmed
tags: removed: kernel-key
tags: added: iso-testing
tags: added: rls-mgr-p-tracking
removed: running-unity
Joseph Salisbury (jsalisbury) wrote :

Does this only happen when the guest is a Precise VM? Does this happen if the guest is an Oneiric VM? It would be good to narrow this down to an issue with the guest or host.

Changed in linux (Ubuntu Precise):
status: Confirmed → Triaged
Joseph Salisbury (jsalisbury) wrote :

Please ignore my comment #13. I see you answered that in comment #9.

tags: added: kernel-key

No, as I said in comment #9, it only fails if the guest is a Precise VM, on the exact same testing environment.
I haven't tried on a Precise host yet.

It fails with KVM but duplicates show that it fails also with VirtualBox.

Joseph Salisbury (jsalisbury) wrote :

It would be great if additional investigation can be done to try and root cause this issue. Having a reliable reproducer would be very helpful.

Joseph Salisbury (jsalisbury) wrote :

Do you know if you see this on other test systems(Same test setup using Oneiric host and Precise VM, but on another physical machine)? It would be good to rule out any hardware issues.

Joseph Salisbury (jsalisbury) wrote :

re: comment #17. I guess this test wouldn't be needed since there are duplicates of this bug.

Andy Whitcroft (apw) wrote :

So in the testing we have done so far we know that Oneiric images are not affected, and Precise ones are. I see that the application is a python one, so I assume that python, libc and the kernel are all different between these images. We need to determine which layer is returning this report. Will try and reproduce here.

Andy Whitcroft (apw) wrote :

QA reports that this first started breaking on the 2011-11-16 looking at the publishing history the kernel changed on 2011-10-28 and 2011-11-18 and that was a microscopic update. The version of python and eglibc would also have been the same as oneiric:

10:42:43 jibel | apw, it started on Nov. 16th with Ubuntu 12.04 LTS "Precise Pangolin" -
                    | Alpha i386 (20111116)
10:45:56 pitti | apw: that pretty much rules out eglibc then
10:46:16 pitti | apw: python as well, unchanged since oneiric
10:50:38 apw | pitti, to confirm i am reading the publishing history right, we uploaded
                    | 3.1.0-2.2 on the 2011-10-28 and 2.3 on the 2011-11-18
10:50:53 pitti | right

Andy Whitcroft (apw) wrote :

It appears that the KVM hosts in this testing may well have been upgraded from Natty to Oneiric at the time of the start of these failures. Investigating that.

Martin Pitt (pitti) wrote :

At last! I'm able to reproduce the bug now. It only happens when the host is running oneiric kernel (running 2.6.38-13.52-generic now). I was never able to reproduce it with running the lucid or precise kernel on the host.

It seems it's a lot more likely to trigger if the guest is i386. I ran

kvm -m 2048 -vga std -net none -drive if=virtio,index=0,file=/home/martin-scratch/images/test.img -cdrom ~/download/ubuntu/precise-desktop-i386.iso -boot d

Martin Pitt (pitti) wrote :

I locally patched ubiquity to capture the file it's crashing on. In this run it was /usr/src/linux-headers-3.2.0-1/arch/mips/include/asm/cacheops.h

Doing another run to see whether it's random or always the same file.

Martin Pitt (pitti) wrote :

I reproduced it again, this time it's some humanity SVG. In both cases it failed in close(), but we also have reports where it fails in write(), so I guess the close() is just the final buffer dump.

So it's not happening for a particular file only, or a weird file type.

bug 897894 uses VMWare as virtualization technology. This the 3rd virtualization tool (with kvm and vbox) affected by this bug.

Jeff Lane (bladernr) wrote :

I should add that with Ubuntu, I saw this on 2/2 attempts earlier today, with Xubuntu just now, 1/3 attempts failed on my system (VBox 64bit host using virtualbox, not the oss-vbox in the repos, 32bit VM)

Jeff Lane (bladernr) wrote :

Correction... Xubutu failed 2/3 times, and both failures were side-by-side installs.

Martin Pitt (pitti) wrote :

Duplicate bug 898040 happened on bare metal, so it seems using a VM just makes this a lot more likely.

Andy Whitcroft (apw) wrote :

I also did some tracing on the QEMU side of my KVM instance and was not able to see any spurious EINVALs from the host. Clearly there is a timing component as replacing the HOST kernel does have an effect. But as noted in comment #28 as it can trigger on a bare metal install it must be in the GUEST side.

Changed in linux (Ubuntu Precise):
assignee: Canonical Kernel Team (canonical-kernel-team) → Andy Whitcroft (apw)
status: Triaged → In Progress

I can reproduce on a Precise i386 VM running on an Oneiric amd64 host, after installation with fragments of code from ubiquity (attachment 894768.invalid_argument)

I downloaded 3GB of deb files, then copied them in a loop as below:
$ for i in seq 1 10; do rm -Rf target/*; ./894768.invalid_argument archive target; done

It often crash on 1rst loop, always on 2nd loop.

syslog with kernel 3.2.0-2.5lp894768v201112011120_i386 attached.

Andreas (andreas-g201) wrote :

Crash in alpha 1 release, too.

Raphaël Hertzog (hertzog) wrote :

The kernel unexpectedly returning EINVAL for I/O is not something new unfortunately, I saw it in a few dpkg related bug reports too: https://bugs.launchpad.net/ubuntu/+source/linux-lts-backport-natty/+bug/827942
https://bugs.launchpad.net/ubuntu/+source/tzdata/+bug/773850

tags: added: bugpattern-written

dpkg is afftected as well (this is on amd64)
Dec 7 08:21:49 in-target: Unpacking language-pack-en-base (from .../language-pack-en-base_1%3a11.10+20111006_all.deb) ...
Dec 7 08:21:49 in-target: dpkg: error processing /var/cache/apt/archives/language-pack-en-base_1%3a11.10+20111006_all.deb (--unpack):
Dec 7 08:21:49 in-target: failed in write on buffer copy for backend dpkg-deb during `./usr/share/locale-langpack/en_GB/LC_MESSAGES/libnih.mo': Invalid argument
Dec 7 08:21:49 in-target: dpkg-deb: error: subprocess paste was killed by signal (Broken pipe)

Note that in the case of server amd64 installations, it always fails while extracting a .mo file from language-pack-en-base
It was reproducible on 4 different runs.

Andy Whitcroft (apw) wrote :

I think I have identified a bug in the kernel when writing to ext4. The latest test kernel at the URL below contains the likely fix for this issue. @Jean-Baptiste -- if you could test that and report back:

    http://people.canonical.com/~apw/lp894768-precise/

I tried kernel 201112070916 on a test environment, and ran 50 iterations of the test with and without CPU or Disk activity:
* Current official Ubuntu kernel: Crash after 2 iterations
* Kernel 201112070916 x86: No crash after 50 iterations

Tim Gardner (timg-tpi) on 2011-12-07
Changed in linux (Ubuntu Precise):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.2.0-3.9

---------------
linux (3.2.0-3.9) precise; urgency=low

  [ Andy Whitcroft ]

  * SAUCE: ext4: correct partial write discard size calculation
    - LP: #894768

  [ Leann Ogasawara ]

  * Revert "SAUCE: x86, microcode, AMD: Restrict microcode reporting"
    - LP: #892615

  [ Matthew Garrett ]

  * SAUCE: pci: Rework ASPM disable code

  [ Upstream Kernel Changes ]

  * x86: Fix boot failures on older AMD CPU's
    - LP: #892615
  * EHCI : Fix a regression in the ISO scheduler
    - LP: #899165
 -- Leann Ogasawara <email address hidden> Mon, 05 Dec 2011 10:37:36 -0800

Changed in linux (Ubuntu Precise):
status: Fix Committed → Fix Released
Walter Lapchynski (wxl) wrote :

where is this fix released at? i can't find it in the repos, even in proposed.

Walter Lapchynski [2011-12-11 6:47 -0000]:
> where is this fix released at? i can't find it in the repos, even in
> proposed.

It only affected Precise, the current development version.

Walter Lapchynski (wxl) wrote :

i should also add it doesn't seem to be in the current daily isos, at
least from last time i checked. at what point will one be able to resume
proper testing of the isos again?

On 12/11/2011 03:18 AM, Martin Pitt wrote:
> Walter Lapchynski [2011-12-11 6:47 -0000]:
>> where is this fix released at? i can't find it in the repos, even in
>> proposed.
> It only affected Precise, the current development version.
>

@walter, it is fixed in kernel 3.2.0-3.9 which is in the latest Precise daily iso (20111211) If you still get this bug, please file a new report, paste the number here and we'll handle it from here.

Thanks in advance.

Raphaël Hertzog (hertzog) wrote :

On Sun, 11 Dec 2011, Martin Pitt wrote:
> Walter Lapchynski [2011-12-11 6:47 -0000]:
> > where is this fix released at? i can't find it in the repos, even in
> > proposed.
>
> It only affected Precise, the current development version.

I don't think this is true. The kernel problem is not new (cf the old
failures against dpkg that I reassigned), it's just that it triggered more
easily under some circumstances in Precise.

It would be interesting to have the opinion of the kernel team on whether
it's easy to backport the fix on older kernels.

Cheers,
--
Raphaël Hertzog ◈ Debian Developer

Pre-order a copy of the Debian Administrator's Handbook and help
liberate it: http://debian-handbook.info/liberation/

Walter Lapchynski (wxl) wrote :

just installed the latest daily and same problems. 3.2.0-4-generic (>
3.2.0-3.9, no?).

Brian Murray (brian-murray) wrote :

Looking at all the duplicates of this bug none of them have a kernel version greater than the version this bug was fixed in. Walter if you are receiving this crash could you please submit a new bug report? Thanks in advance.

candtalan (aeclist) wrote :

My experience is down as a copy of this bug.
Mine:
Bug #916902

I was attempting install from a live usb of Ubuntu 12.04 alpha1
The (two) repeated crashes happened when I began installing from the live desktop session.
I then began my install from the initial boot menu without going into the desktop session. This install process did not crash.
hth

Brian Murray (brian-murray) wrote :

Bug #916902 is about a version of the kernel that was not fixed. @candtalan - you'll need to test with a daily build of the Precise ISOs not to be affected by this bug report.

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
http://iso.qa.ubuntu.com/qatracker/reports/bugs/899506

@Brian, sorry for the delay. Just pulled the latest daily and no issues.
That seems like it may be contrary to your experience?

texaswriter (in-texas-d) wrote :

http://ubuntuforums.org/showthread.php?t=1918587

WORKAROUND: Although this does not fix the problem, it is a workaround. When installing from Ubiquity, uncheck "Install Third Party ...".

This is a bug from alpha 1 (STILL IN alpha 2, CONFIRMED by me).

This bug crashed the installer twice for me.

Brian Murray (brian-murray) wrote :

@texaswriter: The forum post you've linked to is about a different bug where it is not possible to download and install flashplugin because the system being installed to cannot resolve archive.canonical.com. This is another bug that we are working on fixing.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers