Ubuntu

Xen blkfront i/o errors prevent boot in domU

Reported by Malcolm Scott on 2011-08-10
58
This bug affects 11 people
Affects Status Importance Assigned to Milestone
Linux
Invalid
Undecided
Unassigned
linux (Ubuntu)
Undecided
Unassigned

Bug Description

When booting a Xen PV domU on 3.0.0-8-generic-pae (latest in oneiric) my root filesystem falls over immediately after boot:

[ 5.685992] blkfront: barrier: empty write xvdb op failed
[ 5.685998] blkfront: xvdb: barrier or flush: disabled
[ 5.701669] end_request: I/O error, dev xvdb, sector 4456912
[ 5.701688] end_request: I/O error, dev xvdb, sector 4456912
[ 5.701728] Aborting journal on device xvdb-8.
[ 5.720625] journal commit I/O error
[ 5.724119] EXT4-fs error (device xvdb): ext4_journal_start_sb:296: Detected aborted journal
[ 5.724142] EXT4-fs (xvdb): Remounting filesystem read-only

I do not get the same errors when booting 2.6.38-10-generic-pae from natty (but I do still see "blkfront: xvdb: empty write barrier op failed" and "blkfront: xvdb: barriers disabled"). I don't think my disk is broken!

Complete dmesg attached.
---
AcpiTables:

AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 2011-08-19 20:43 seq
 crw-rw---- 1 root audio 116, 33 2011-08-19 20:43 timer
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: i386
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 11.10
IwConfig: Error: [Errno 2] No such file or directory
Lspci: Error: [Errno 2] No such file or directory
Lsusb: Error: [Errno 2] No such file or directory
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: root=UUID=b16ffd9c-a54a-4dd3-a839-16c382ea8787 ro console=hvc0
ProcVersionSignature: Ubuntu 3.0.0-8.11-generic-pae 3.0.1
RelatedPackageVersions:
 linux-restricted-modules-3.0.0-8-generic-pae N/A
 linux-backports-modules-3.0.0-8-generic-pae N/A
 linux-firmware 1.59
RfKill: Error: [Errno 2] No such file or directory
Tags: oneiric
Uname: Linux 3.0.0-8-generic-pae i686
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

Malcolm Scott (malcscott) wrote :

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 824089

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: natty

apport information

tags: added: apport-collected oneiric
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

Updated with apport information as requested. There may be errors amongst this data since the root fs journal had already been aborted by this point (apport-collect only worked after mounting a tmpfs over /root!).

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: removed: natty

Bug still exists in 3.0.0-9-generic-pae.

summary: - Xen blkfront i/o errors on kernel 3.0.0-8-generic-pae
+ Xen blkfront i/o errors on kernel 3.0.0-9-generic-pae
Malcolm Scott (malcscott) wrote :

Someone else has reported this bug to xen-devel: http://lists.xensource.com/archives/html/xen-devel/2011-08/msg00852.html

Apparently 3.0.2 works.

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel currently in the release pocket than the one you tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

If the bug still exists, change the bug status from Incomplete to New. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

Thank you for your help.

Changed in linux:
status: New → Incomplete
tags: added: kernel-request-3.0.0-11.17

Bug still exists in 3.0.0-11.17.

Changed in linux:
status: Incomplete → New
summary: - Xen blkfront i/o errors on kernel 3.0.0-9-generic-pae
+ Xen blkfront i/o errors on kernel 3.0.0-11.17

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 824089

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux:
status: New → Incomplete

Apport logs already present...

Changed in linux:
status: Incomplete → Confirmed

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel currently in the release pocket than the one you tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

Thank you for your help.

Changed in linux:
status: Confirmed → Incomplete
tags: added: kernel-request-3.0.0-11.18
Brad Figg (brad-figg) wrote :

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel currently in the release pocket than the one you tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

If the bug still exists, change the bug status from Incomplete to Incomplete. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

Thank you for your help.

Bug still present in 3.0.0-11.18

summary: - Xen blkfront i/o errors on kernel 3.0.0-11.17
+ Xen blkfront i/o errors on kernel 3.0.0-11.18
Changed in linux:
status: Incomplete → Confirmed

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

Thank you for your help, we really do appreciate it.

Changed in linux:
status: Confirmed → Incomplete
tags: added: kernel-request-3.0.0-12.19

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

If the bug still exists, change the bug status from Incomplete to Incomplete. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

Thank you for your help, we really do appreciate it.

tags: added: kernel-request-3.0.0-12.20

Bug still present in 3.0.0-12.20.

Changed in linux:
status: Incomplete → Confirmed
summary: - Xen blkfront i/o errors on kernel 3.0.0-11.18
+ Xen blkfront i/o errors
summary: - Xen blkfront i/o errors
+ Xen blkfront i/o errors prevent boot in domU
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

tags: added: needs-upstream-testing

What is your dom0?

Malcolm Scott (malcscott) wrote :

Konrad: XenServer 5.6 SP2.

Malcolm Scott (malcscott) wrote :

The bug is present in mainline kernel 3.1rc9.

tags: removed: kernel-request-3.0.0-11.17 kernel-request-3.0.0-11.18 kernel-request-3.0.0-12.19 kernel-request-3.0.0-12.20 needs-upstream-testing

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

Thank you for your help, we really do appreciate it.

Changed in linux:
status: Confirmed → Incomplete
tags: added: kernel-request-3.0.0-12.20
Malcolm Scott (malcscott) wrote :

Already confirmed on 3.0.0-12.20.

Changed in linux:
status: Incomplete → Confirmed
Tim Gardner (timg-tpi) wrote :

Malcolm - I believe XenServer 6 is the first version to be certified with Lucid (10.04) as a DomU. Are you at liberty to test that version pairing? Then if 11.10 doesn't work as a DomU then we need to have a look. 11.10 should also work as a Dom0 on XS 6.

Stefan Bader (smb) wrote :

The messages about the barrier fail as such are just informational. The interesting part to me seems that while xvda and xvdc see the capacity change message, there is nothing for xvdb. Maybe I missed the info but it would be interesting to know what kind of mapping is used for the pv disk (probably all off them) in the xen guest configuration. And maybe there is something wrong with whatever the mapping target is?

Changed in linux (Ubuntu):
status: Confirmed → Triaged
Malcolm Scott (malcscott) wrote :

For the record I've just tested linux-image-3.2.0-030200rc2-generic_3.2.0-030200rc2.201111151435 and the bug is still present. This is still on XenServer 5.6 SP2; I'll try to test on a XS 6 install later. (Lucid does work on both versions; the problem is only with Oneiric.)

Stefan: I hadn't spotted the capacity change discrepancy, but on my latest test with the above kernel I do get capacity change messages for all three disks:

[ 0.492436] blkfront: xvda: barrier: enabled
[ 0.495160] xvda: unknown partition table
[ 0.496794] blkfront: xvdb: barrier: enabled
[ 0.498990] xvdb: unknown partition table
[ 0.499217] Setting capacity to 524288
[ 0.499225] xvda: detected capacity change from 0 to 268435456
[ 0.499911] blkfront: xvdc: barrier: enabled
[ 0.501937] xvdc: unknown partition table
[ 0.502127] Setting capacity to 12582912
[ 0.502136] xvdb: detected capacity change from 0 to 6442450944
[ 0.507102] Setting capacity to 1024000
[ 0.507117] xvdc: detected capacity change from 0 to 524288000

These disks are all XenServer's NFS-backed virtual disks. The bug does NOT occur if the root filesystem is on local storage.

summary: - Xen blkfront i/o errors prevent boot in domU
+ Xen blkfront i/o errors on NFS-backed disks prevent boot in domU

The bug also occurs on XenServer 6 (with Oneiric's kernel, on NFS-backed storage).

Malcolm Scott (malcscott) wrote :

Correction -- the bug DOES *sometimes* occur if the root filesystem is on local storage. On local storage, the bug has so far occurred on three out of six boot attempts. I have yet to see a successful boot on NFS storage.

summary: - Xen blkfront i/o errors on NFS-backed disks prevent boot in domU
+ Xen blkfront i/o errors prevent boot in domU
Aleksandr Gordeev (a-gordeev) wrote :

hi folks, i found a workaround from xen-devel mailing list.

just disable barrier for ext4 by putting
'barrier=0' in your /etc/fstab

line with your root fs should be like this:
/dev/xvdc /blah ext4 errors=remount-ro,barrier=0 0 1

Aleksandr Gordeev (a-gordeev) wrote :

unfortunatelly, the workaround, which is described above, sometimes wont work and xen-blkfront is corrupting fs again. But with enabled barries it is corrupting fs on every second system boot.

Iustinian T. (iustinian) wrote :

Same on Lucid with the latest linux-image-virtual-lts-backport-oneiric 3.0.0-13-virtual #22~lucid1-Ubuntu SMP. Disabling barriers works but since these are suposed to be production ready boxes the only way is to revert to linux-image-virtual-lts-backport-natty.

Michael MacLeod (mikemacleod) wrote :

I just want to add that I'm encountering this issue with a Debian Squeeze (Xen: 4.0.1) dom0. I'm using LVM volumes to store my disk images (phy:/dev/Data/foobar,xvda,w), and have installed the domU using netinstall. The install works fine, the error only happens when booting the domU for the first time from the installed kernel.

Malcolm Scott (malcscott) wrote :

This bug is still present in the 3.2.2-030202 mainline build.

Malcolm Scott (malcscott) wrote :

This bug is also still present in the 3.3.0-030300rc1 mainline build.

Stefan Bader (smb) wrote :

I looked back into the mailing list archives and there has been a thread titled: "blkfront problem in pvops kernel when barriers enabled". As far as I understand, this is a problem of the backend driver (which is part of the host code), offering the barrier method for syncing but then failing with empty requests. In my testing with Ubuntu based dom0 (which is Xen 4.1.2) guest will be using cache flush for syncs. And I did not experience any of the observed problems.

So this looks to me like it needs to be fixed in the host (Xenserver or Debian) by making sure to have the following change:

# HG changeset patch
From: Jan Beulich <email address hidden>
# Date 1306409621 -3600
# Node ID 876a5aaac0264cf38cae6581e5714b93ec380aaa
# Parent aedb712c05cf065e943e15d0f38597c2e80f7982
Subject: xen/blkback: don't fail empty barrier requests

Malcolm Scott (malcscott) wrote :

This turned out to be a XenServer bug related to the interaction between blkback and barriers; my contact at Citrix has produced a dom0 kernel patch which fixes the issue. I'm told the fix will be incorporated in the next point release of XenServer.

Changed in linux (Ubuntu):
status: Triaged → Invalid
Changed in linux:
status: Confirmed → Invalid
Matt Sealey (mwsealey) wrote :

Any chance your contact gave a better timeframe than the next point release, since that was released already twice (6.0.2) and I'm still seeing the problem here at a most infuriating rate..

Is there some way of making sure that a new VM install won't have any problems with this? Once the system is booted, it seems fine here apart from the fact that the root filesystem is mounted read-only. I don't want to remove remount-ro from fstab in case a real filesystem error crops up...

Malcolm Scott (malcscott) wrote :

The release incorporating this fix is due for release around late July / early August. The only workaround I know for now is to install a patched kernel in the XenServer dom0. I have one and you may contact me by email if you would like a copy, but bear in mind that it is very much not supported by Citrix.

Malcolm Scott (malcscott) wrote :

I should clarify that the timeframe for the XenServer release (6.1) I gave above is an unofficial, unauthoritative estimate and subject to alteration by Citrix.

Torsten Krah (tkrah) wrote :

Still there with current precise LTS.

Changed in linux (Ubuntu):
status: Invalid → Confirmed
tags: added: precise
Malcolm Scott (malcscott) wrote :

Torsten, this is a XenServer bug and not an Ubuntu bug. No current release of XenServer is capable of running guests with recent Linux kernels. A fix is (still) pending from Citrix. If you want an unofficial fix (a replacement kernel for XenServer dom0, very unsupported) contact me privately. If you have bought XenServer, it can't hurt to raise a ticket with Citrix asking for the official patch to be published more quickly.

Changed in linux (Ubuntu):
status: Confirmed → Invalid
Nate Carlson (natecarlson) wrote :

I've hit similar issues with XFS, and used the not-so-lovely workaround of setting 'barrier=0' as mount options for that filesystem, which works great. Not sure if it'd work with ext4 or not.

@Malcolm, do you have a Ticket ID that I can reference in a support ticket to Citrix?

Side note - XenServer 6.1 sounds like it's still a ways out, at least last I heard.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers