Copying large files (10G) cause a SW reset of the SATA channel / drive.

Bug #253011 reported by Michael Nagel
6
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Hi,

Problem:
  The machine becomes somewhat unresponsive during the process as its flooding dmesg and also attempting to
reset the controller from what it looks like.

Environment:

In my Dom0 machine, I copy disk images (DomUs) so I can rebuild various software environments. In this
particular instance, I was trying to attempting to overwrite an existing img with a pristine one (cp build.img build-johnny.img)
where the image is 10G. I'm running Ubuntu 8.0.4.1 on AMD64 box with an ICH9 controller using
a WDC WD2500AAJS-7 drive.

An excerpt of the messages:
[ 1194.969287] PCI-DMA: Out of SW-IOMMU space for 49152 bytes at device 0000:00:1f.2
[ 1194.990965] ata1.00: exception Emask 0x0 SAct 0x10 SErr 0x0 action 0x0
[ 1194.990990] ata1.00: cmd 61/00:20:ae:47:12/04:00:02:00:00/40 tag 4 ncq 524288 out
[ 1194.990991] res 40/00:14:ae:37:12/00:00:02:00:00/40 Emask 0x40 (internal error)
[ 1194.991020] ata1.00: status: { DRDY }
[ 1194.993547] ata1.00: configured for UDMA/133
[ 1194.993552] ata1: EH complete
[ 1194.368923] sd 0:0:0:0: [sda] 488281250 512-byte hardware sectors (250000 MB)
[ 1194.368935] sd 0:0:0:0: [sda] Write Protect is off
[ 1194.368937] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 1194.368950] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 1195.117981] PCI-DMA: Out of SW-IOMMU space for 32768 bytes at device 0000:00:1f.2
[ 1195.155890] ata1.00: exception Emask 0x0 SAct 0x20 SErr 0x0 action 0x0
[ 1195.155923] ata1.00: cmd 61/00:28:b6:5f:12/04:00:02:00:00/40 tag 5 ncq 524288 out
[ 1195.155924] res 40/00:0c:1e:52:12/00:00:02:00:00/40 Emask 0x40 (internal error)
[ 1195.155953] ata1.00: status: { DRDY }
[ 1195.157974] ata1.00: configured for UDMA/133
[ 1195.157980] ata1: EH complete
[ 1195.159071] PCI-DMA: Out of SW-IOMMU space for 65536 bytes at device 0000:00:1f.2
[ 1195.177964] ata1.00: exception Emask 0x0 SAct 0x10 SErr 0x0 action 0x0
[ 1195.177988] ata1.00: cmd 61/00:20:b6:6f:12/04:00:02:00:00/40 tag 4 ncq 524288 out
[ 1195.177989] res 40/00:04:b6:5f:12/00:00:02:00:00/40 Emask 0x40 (internal error)
[ 1195.178018] ata1.00: status: { DRDY }
[ 1195.180487] ata1.00: configured for UDMA/133
[ 1195.180492] ata1: EH complete

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Neil Munro (neilmunro-deactivatedaccount) wrote :

The Intrepid Ibex 8.10 Beta release was most recently announced - http://www.ubuntu.com/testing/intrepid/beta . It contains the 2.6.27 Ubuntu kernel. It would be great if you could test and verify if this is still an issue. The status is being set to Incomplete until we receive further feedback. Thanks.

Changed in linux:
status: New → Incomplete
Revision history for this message
timtim (tim-ellis) wrote :

I also have this issue with an ICH9 based controller with similar messages in the kernel ring buffer on Ubuntu Server 8.04. I am trying to use dd to copy large lvm2 volumes to clone vms - this issue happens intermittently in my case.

Revision history for this message
johnny5 (johnny-trustcommerce) wrote :

Hi,

I'm not familiar with the Ubuntu Release process unfortunately and going to an Alpha as described by Leann or a Beta as described by Neil might create more of a headache for me to maintain than its worth (newer packages, dependencies, etc.) I would probably also have to get some additional hardware as well as I already have a more or less working setup minus that...

So I guess if you could assist with option #1, that would be appreciated. I guess all I need is a build of the Ubuntu variant of the 2.6.27 kernel (xen) which uses the same gcc available for that release and I'm somewhat familiar with using dpkg. I can retest it from there as appropriate. Or alternatively, if simply waiting for the 8.10 release makes more sense, then maybe thats the best option.

Revision history for this message
timtim (tim-ellis) wrote :

This issue actually seems to occur whenever IO is happening, however it I guess its only noticeable when it actually manages to kill a large file transfer for some reason.

Michael Nagel (nailor)
Changed in linux:
status: Incomplete → New
Revision history for this message
kernel-janitor (kernel-janitor) wrote :

Hi Michael,

This bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux-image-`uname -r` 253011

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu release http://www.ubuntu.com/getubuntu/download . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.