Xen blkfront i/o errors prevent boot in domU

Bug #824089 reported by Malcolm Scott
58
This bug affects 11 people
Affects Status Importance Assigned to Milestone
Linux
Invalid
Undecided
Unassigned
linux (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

When booting a Xen PV domU on 3.0.0-8-generic-pae (latest in oneiric) my root filesystem falls over immediately after boot:

[ 5.685992] blkfront: barrier: empty write xvdb op failed
[ 5.685998] blkfront: xvdb: barrier or flush: disabled
[ 5.701669] end_request: I/O error, dev xvdb, sector 4456912
[ 5.701688] end_request: I/O error, dev xvdb, sector 4456912
[ 5.701728] Aborting journal on device xvdb-8.
[ 5.720625] journal commit I/O error
[ 5.724119] EXT4-fs error (device xvdb): ext4_journal_start_sb:296: Detected aborted journal
[ 5.724142] EXT4-fs (xvdb): Remounting filesystem read-only

I do not get the same errors when booting 2.6.38-10-generic-pae from natty (but I do still see "blkfront: xvdb: empty write barrier op failed" and "blkfront: xvdb: barriers disabled"). I don't think my disk is broken!

Complete dmesg attached.
---
AcpiTables:

AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 2011-08-19 20:43 seq
 crw-rw---- 1 root audio 116, 33 2011-08-19 20:43 timer
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: i386
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 11.10
IwConfig: Error: [Errno 2] No such file or directory
Lspci: Error: [Errno 2] No such file or directory
Lsusb: Error: [Errno 2] No such file or directory
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: root=UUID=b16ffd9c-a54a-4dd3-a839-16c382ea8787 ro console=hvc0
ProcVersionSignature: Ubuntu 3.0.0-8.11-generic-pae 3.0.1
RelatedPackageVersions:
 linux-restricted-modules-3.0.0-8-generic-pae N/A
 linux-backports-modules-3.0.0-8-generic-pae N/A
 linux-firmware 1.59
RfKill: Error: [Errno 2] No such file or directory
Tags: oneiric
Uname: Linux 3.0.0-8-generic-pae i686
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

Revision history for this message
Malcolm Scott (malcscott) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 824089

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: natty
Revision history for this message
Malcolm Scott (malcscott) wrote : BootDmesg.txt

apport information

tags: added: apport-collected oneiric
description: updated
Revision history for this message
Malcolm Scott (malcscott) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Malcolm Scott (malcscott) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Malcolm Scott (malcscott) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Malcolm Scott (malcscott) wrote : ProcModules.txt

apport information

Revision history for this message
Malcolm Scott (malcscott) wrote : UdevDb.txt

apport information

Revision history for this message
Malcolm Scott (malcscott) wrote : UdevLog.txt

apport information

Revision history for this message
Malcolm Scott (malcscott) wrote : WifiSyslog.txt

apport information

Revision history for this message
Malcolm Scott (malcscott) wrote : Re: Xen blkfront i/o errors on kernel 3.0.0-8-generic-pae

Updated with apport information as requested. There may be errors amongst this data since the root fs journal had already been aborted by this point (apport-collect only worked after mounting a tmpfs over /root!).

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: removed: natty
Revision history for this message
Malcolm Scott (malcscott) wrote : Re: Xen blkfront i/o errors on kernel 3.0.0-9-generic-pae

Bug still exists in 3.0.0-9-generic-pae.

summary: - Xen blkfront i/o errors on kernel 3.0.0-8-generic-pae
+ Xen blkfront i/o errors on kernel 3.0.0-9-generic-pae
Revision history for this message
Malcolm Scott (malcscott) wrote :

Someone else has reported this bug to xen-devel: http://lists.xensource.com/archives/html/xen-devel/2011-08/msg00852.html

Apparently 3.0.2 works.

Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.0.0-11.17)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel currently in the release pocket than the one you tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

If the bug still exists, change the bug status from Incomplete to New. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

Thank you for your help.

Changed in linux:
status: New → Incomplete
tags: added: kernel-request-3.0.0-11.17
Revision history for this message
Malcolm Scott (malcscott) wrote : Re: Xen blkfront i/o errors on kernel 3.0.0-9-generic-pae

Bug still exists in 3.0.0-11.17.

Changed in linux:
status: Incomplete → New
summary: - Xen blkfront i/o errors on kernel 3.0.0-9-generic-pae
+ Xen blkfront i/o errors on kernel 3.0.0-11.17
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 824089

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux:
status: New → Incomplete
Revision history for this message
Malcolm Scott (malcscott) wrote : Re: Xen blkfront i/o errors on kernel 3.0.0-11.17

Apport logs already present...

Changed in linux:
status: Incomplete → Confirmed
Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.0.0-11.18)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel currently in the release pocket than the one you tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

Thank you for your help.

Changed in linux:
status: Confirmed → Incomplete
tags: added: kernel-request-3.0.0-11.18
Revision history for this message
Brad Figg (brad-figg) wrote :

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel currently in the release pocket than the one you tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

If the bug still exists, change the bug status from Incomplete to Incomplete. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

Thank you for your help.

Revision history for this message
Malcolm Scott (malcscott) wrote : Re: Xen blkfront i/o errors on kernel 3.0.0-11.18

Bug still present in 3.0.0-11.18

summary: - Xen blkfront i/o errors on kernel 3.0.0-11.17
+ Xen blkfront i/o errors on kernel 3.0.0-11.18
Changed in linux:
status: Incomplete → Confirmed
Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.0.0-12.19)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

Thank you for your help, we really do appreciate it.

Changed in linux:
status: Confirmed → Incomplete
tags: added: kernel-request-3.0.0-12.19
Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.0.0-12.20)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

If the bug still exists, change the bug status from Incomplete to Incomplete. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

Thank you for your help, we really do appreciate it.

tags: added: kernel-request-3.0.0-12.20
Revision history for this message
Malcolm Scott (malcscott) wrote : Re: Xen blkfront i/o errors on kernel 3.0.0-11.18

Bug still present in 3.0.0-12.20.

Changed in linux:
status: Incomplete → Confirmed
summary: - Xen blkfront i/o errors on kernel 3.0.0-11.18
+ Xen blkfront i/o errors
summary: - Xen blkfront i/o errors
+ Xen blkfront i/o errors prevent boot in domU
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

tags: added: needs-upstream-testing
Revision history for this message
Konrad Rzeszutek Wilk (konrad-wilk) wrote :

What is your dom0?

Revision history for this message
Malcolm Scott (malcscott) wrote :

Konrad: XenServer 5.6 SP2.

Revision history for this message
Malcolm Scott (malcscott) wrote :

The bug is present in mainline kernel 3.1rc9.

tags: removed: kernel-request-3.0.0-11.17 kernel-request-3.0.0-11.18 kernel-request-3.0.0-12.19 kernel-request-3.0.0-12.20 needs-upstream-testing
Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.0.0-12.20)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

Thank you for your help, we really do appreciate it.

Changed in linux:
status: Confirmed → Incomplete
tags: added: kernel-request-3.0.0-12.20
Revision history for this message
Malcolm Scott (malcscott) wrote :

Already confirmed on 3.0.0-12.20.

Changed in linux:
status: Incomplete → Confirmed
Revision history for this message
Tim Gardner (timg-tpi) wrote :

Malcolm - I believe XenServer 6 is the first version to be certified with Lucid (10.04) as a DomU. Are you at liberty to test that version pairing? Then if 11.10 doesn't work as a DomU then we need to have a look. 11.10 should also work as a Dom0 on XS 6.

Revision history for this message
Stefan Bader (smb) wrote :

The messages about the barrier fail as such are just informational. The interesting part to me seems that while xvda and xvdc see the capacity change message, there is nothing for xvdb. Maybe I missed the info but it would be interesting to know what kind of mapping is used for the pv disk (probably all off them) in the xen guest configuration. And maybe there is something wrong with whatever the mapping target is?

Changed in linux (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Malcolm Scott (malcscott) wrote :

For the record I've just tested linux-image-3.2.0-030200rc2-generic_3.2.0-030200rc2.201111151435 and the bug is still present. This is still on XenServer 5.6 SP2; I'll try to test on a XS 6 install later. (Lucid does work on both versions; the problem is only with Oneiric.)

Stefan: I hadn't spotted the capacity change discrepancy, but on my latest test with the above kernel I do get capacity change messages for all three disks:

[ 0.492436] blkfront: xvda: barrier: enabled
[ 0.495160] xvda: unknown partition table
[ 0.496794] blkfront: xvdb: barrier: enabled
[ 0.498990] xvdb: unknown partition table
[ 0.499217] Setting capacity to 524288
[ 0.499225] xvda: detected capacity change from 0 to 268435456
[ 0.499911] blkfront: xvdc: barrier: enabled
[ 0.501937] xvdc: unknown partition table
[ 0.502127] Setting capacity to 12582912
[ 0.502136] xvdb: detected capacity change from 0 to 6442450944
[ 0.507102] Setting capacity to 1024000
[ 0.507117] xvdc: detected capacity change from 0 to 524288000

These disks are all XenServer's NFS-backed virtual disks. The bug does NOT occur if the root filesystem is on local storage.

summary: - Xen blkfront i/o errors prevent boot in domU
+ Xen blkfront i/o errors on NFS-backed disks prevent boot in domU
Revision history for this message
Malcolm Scott (malcscott) wrote : Re: Xen blkfront i/o errors on NFS-backed disks prevent boot in domU

The bug also occurs on XenServer 6 (with Oneiric's kernel, on NFS-backed storage).

Revision history for this message
Malcolm Scott (malcscott) wrote :

Correction -- the bug DOES *sometimes* occur if the root filesystem is on local storage. On local storage, the bug has so far occurred on three out of six boot attempts. I have yet to see a successful boot on NFS storage.

summary: - Xen blkfront i/o errors on NFS-backed disks prevent boot in domU
+ Xen blkfront i/o errors prevent boot in domU
Revision history for this message
Alexander Gordeev (a-gordeev) wrote :

hi folks, i found a workaround from xen-devel mailing list.

just disable barrier for ext4 by putting
'barrier=0' in your /etc/fstab

line with your root fs should be like this:
/dev/xvdc /blah ext4 errors=remount-ro,barrier=0 0 1

Revision history for this message
Alexander Gordeev (a-gordeev) wrote :

unfortunatelly, the workaround, which is described above, sometimes wont work and xen-blkfront is corrupting fs again. But with enabled barries it is corrupting fs on every second system boot.

Revision history for this message
Iustinian T. (iustinian) wrote :

Same on Lucid with the latest linux-image-virtual-lts-backport-oneiric 3.0.0-13-virtual #22~lucid1-Ubuntu SMP. Disabling barriers works but since these are suposed to be production ready boxes the only way is to revert to linux-image-virtual-lts-backport-natty.

Revision history for this message
Michael MacLeod (mikemacleod) wrote :

I just want to add that I'm encountering this issue with a Debian Squeeze (Xen: 4.0.1) dom0. I'm using LVM volumes to store my disk images (phy:/dev/Data/foobar,xvda,w), and have installed the domU using netinstall. The install works fine, the error only happens when booting the domU for the first time from the installed kernel.

Revision history for this message
Malcolm Scott (malcscott) wrote :

This bug is still present in the 3.2.2-030202 mainline build.

Revision history for this message
Malcolm Scott (malcscott) wrote :

This bug is also still present in the 3.3.0-030300rc1 mainline build.

Revision history for this message
Stefan Bader (smb) wrote :

I looked back into the mailing list archives and there has been a thread titled: "blkfront problem in pvops kernel when barriers enabled". As far as I understand, this is a problem of the backend driver (which is part of the host code), offering the barrier method for syncing but then failing with empty requests. In my testing with Ubuntu based dom0 (which is Xen 4.1.2) guest will be using cache flush for syncs. And I did not experience any of the observed problems.

So this looks to me like it needs to be fixed in the host (Xenserver or Debian) by making sure to have the following change:

# HG changeset patch
From: Jan Beulich <email address hidden>
# Date 1306409621 -3600
# Node ID 876a5aaac0264cf38cae6581e5714b93ec380aaa
# Parent aedb712c05cf065e943e15d0f38597c2e80f7982
Subject: xen/blkback: don't fail empty barrier requests

Revision history for this message
Malcolm Scott (malcscott) wrote :

This turned out to be a XenServer bug related to the interaction between blkback and barriers; my contact at Citrix has produced a dom0 kernel patch which fixes the issue. I'm told the fix will be incorporated in the next point release of XenServer.

Changed in linux (Ubuntu):
status: Triaged → Invalid
Changed in linux:
status: Confirmed → Invalid
Revision history for this message
Matt Sealey (mwsealey) wrote :

Any chance your contact gave a better timeframe than the next point release, since that was released already twice (6.0.2) and I'm still seeing the problem here at a most infuriating rate..

Is there some way of making sure that a new VM install won't have any problems with this? Once the system is booted, it seems fine here apart from the fact that the root filesystem is mounted read-only. I don't want to remove remount-ro from fstab in case a real filesystem error crops up...

Revision history for this message
Malcolm Scott (malcscott) wrote :

The release incorporating this fix is due for release around late July / early August. The only workaround I know for now is to install a patched kernel in the XenServer dom0. I have one and you may contact me by email if you would like a copy, but bear in mind that it is very much not supported by Citrix.

Revision history for this message
Malcolm Scott (malcscott) wrote :

I should clarify that the timeframe for the XenServer release (6.1) I gave above is an unofficial, unauthoritative estimate and subject to alteration by Citrix.

Revision history for this message
Torsten Krah (tkrah) wrote :

Still there with current precise LTS.

Changed in linux (Ubuntu):
status: Invalid → Confirmed
tags: added: precise
Revision history for this message
Malcolm Scott (malcscott) wrote :

Torsten, this is a XenServer bug and not an Ubuntu bug. No current release of XenServer is capable of running guests with recent Linux kernels. A fix is (still) pending from Citrix. If you want an unofficial fix (a replacement kernel for XenServer dom0, very unsupported) contact me privately. If you have bought XenServer, it can't hurt to raise a ticket with Citrix asking for the official patch to be published more quickly.

Changed in linux (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Nate Carlson (natecarlson) wrote :

I've hit similar issues with XFS, and used the not-so-lovely workaround of setting 'barrier=0' as mount options for that filesystem, which works great. Not sure if it'd work with ext4 or not.

@Malcolm, do you have a Ticket ID that I can reference in a support ticket to Citrix?

Side note - XenServer 6.1 sounds like it's still a ways out, at least last I heard.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.