stex driver (Promise SuperTrak 8350/4650,etc) produces drastic I/O errors/corruption with 10.04 or later

Bug #586897 reported by Bryan
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Unknown
linux (Ubuntu)
Invalid
Undecided
Unassigned
linux-2.6 (Debian)
Fix Released
Unknown

Bug Description

Server with the Promise Supertrak EX 8350 functions fine in 9.04 (i386).

After a fresh install of 10.04 (i386):

* Random files become corrupt
* Running mkfs on storage1 or storage2 hang the system, forcing a reboot (ext4 and xfs.) After a couple minutes, we will receive an error message about process hang timeout 120 seconds and an echo command to disable this message.
* System can take abnormal amount of time to mount /
* "on-boot" fsck come back fine

System layout:
/dev/sda1: 256 MB ext2 (mnt /boot)
/dev/sda2: 2TB lvm
/dev/sdb1: 2TB lvm
/dev/sdc1: 1.6TB lvm

LVG: 5.6TB

LV system: 8 GB ext4 (mnt /)
LV swap: 2 GB swap
LV storage1: 1.4TB xfs
LV storage2: 3.5TB xfs

Revision history for this message
Bryn Hughes (linux-nashira) wrote :

I have similar problems with 10.04. 9.04 worked fine, 10.04 I'm getting read errors and controller resets. Anything write-related seems fine, but read-intensive transactions seem to have big problems. Definitely getting corrupted files/data too.

Revision history for this message
Bryn Hughes (linux-nashira) wrote :

Note I have reverted back to 9.04 and there are no problems. 10.10 / 10.04 are broken.

Changed in ubuntu:
status: New → Confirmed
summary: - promise supertrak 8350 fails in 10.04
+ Promise SuperTrak 8350 drastic I/O errors with 10.04 or later
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Chris Smith (chris-nevermind) wrote : Re: Promise SuperTrak 8350 drastic I/O errors with 10.04 or later

Hi,

I'm seeing random file corruption when I install Ubuntu 10.04 LTS - notably in libperl and and apt libraries.

Also, when the system is under load it will reliably output incorrect md5sums etc...

Feel free to contact me regarding any further information

Cheers,
Chris.

Revision history for this message
Chris Smith (chris-nevermind) wrote :

Just thought I'd add that my card is Promise SuperTrak EX 4650

Revision history for this message
Bryn Hughes (linux-nashira) wrote :

Tested under 10.10 as well - problems still exist. 10.10 install completes OK but system can barely boot on its own. Many commands report corrupted libraries or files.

summary: - Promise SuperTrak 8350 drastic I/O errors with 10.04 or later
+ stex driver (Promise SuperTrak 8350/4650,etc) produces drastic I/O
+ errors/corruption with 10.04 or later
Revision history for this message
Chris Smith (chris-nevermind) wrote :

I successfully install Debian Testing (squeeze) on this machine and everything seems to be running fine.

One big difference is that Ubuntu 10.XX uses ext4 by default and squeeze/Ubuntu 9.04, where it apparently works fine, uses ext3. Is there an incompatibility here?

Revision history for this message
Bryn Hughes (linux-nashira) wrote :

Chris, can you post the stex driver version on the Debian machine (and on 10.04/later if you can) - I had to revert and get my machine back online.

Working configuration (9.04) - stex 4.6.0000.3

You can find the version by entering 'dmesg | grep stex'

Revision history for this message
Bryn Hughes (linux-nashira) wrote :

Sorry, meant 9.10 above (which works fine).

Revision history for this message
Chris Smith (chris-nevermind) wrote :

I may have spoke too soon regarding my Debian install - I had a spate of segfaults in ld-2.11.2.so, indicating file corruption.

Revision history for this message
Chris Smith (chris-nevermind) wrote :

[ 0.454074] stex: Promise SuperTrak EX Driver version: 4.6.0000.3
[ 0.454099] stex 0000:07:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 0.454103] stex 0000:07:00.0: setting latency timer to 64

Same driver version as 9.04!

I guess that narrows it down to subsystem changes in recent revisions of the kernel, what is the kernel version in your 9.10 install? I wonder if the problem is reproduced if you install a newer kernel from the PPA.

Revision history for this message
Bryn Hughes (linux-nashira) wrote :

I'd actually meant 9.10 above (I know I said 9.04)...

Kernel version is 2.6.31-22-server

Unfortunately I now have real data on this system so I can't experiment with it anymore. I can however confirm that ext4 under 9.10 / 2.6.31-22-server works fine...

Revision history for this message
Chris Smith (chris-nevermind) wrote :

md kernel hackers have apparently found and squashed this bug as detailed here:

http://marc.info/?l=linux-scsi&m=129021716922966&w=2

Are we going to see a backport for this patch to the Lucid kernel? Probably a good idea to spin it into the next installation media too (10.04.2?) as this can damage a system from the very beginning.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

From the discussion listed here: http://marc.info/?t=129070303200002&r=1&w=2 this fix may still be in a state of flux. I am going to research a bit and will post an update here tomorrow with the information I find.

Thanks!

~JFo

Changed in linux:
status: Unknown → Confirmed
Changed in linux-2.6 (Debian):
status: Unknown → Confirmed
Revision history for this message
Chris Smith (chris-nevermind) wrote :
Revision history for this message
Bryn Hughes (linux-nashira) wrote :

Are there any updates on this? I would rather not upgrade until such time that I know the fix has been committed, but 9.10 is now getting pretty old!

Revision history for this message
Chris Smith (chris-nevermind) wrote :

According to the debbug, the fix has been committed to 2.6.36.3:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=604049#56

Is this patch available for a backport to a Lucid kernel and install media? This is a popular piece of hardware.

Revision history for this message
Chris Smith (chris-nevermind) wrote :

I spoke to the helpful team in #ubuntu-kernel and managed to get this information:

- The fix was included in the backported natty kernel available on 10.04

- The fix was also included in a recent Lucid stable kernel update: "2.6.32.28+drm33.12 stable release" (Bug #705045)

- There is a new 10.04 install media release due at the end of July and this will include the driver fix for this RAID card

So, upgrading a Karmic installation to Lucid now will have a kernel that includes the fix to allow this card to work correctly. If you need to install a new system, updated installation media that include the required kernel fix should be released at the end of July.

Once I get a chance to test this I will report back.

Revision history for this message
Bryn Hughes (linux-nashira) wrote :

I can confirm that this is fixed in 10.04.2 PROVIDED you get there via upgrading from 9.10 first. Tested with 2.6.32-33-server.

Note that the maverick-backports kernel (2.6.35) does NOT work - it continues to produce excessive errors / controller resets.

Revision history for this message
Bryan (watermark86) wrote :

I've been running a fresh install of 10.04.3 (as a NAS, so it has a decent load) without issues for about a week. I think this issue is fully resolved.

Changed in linux:
status: Confirmed → Fix Released
Changed in linux-2.6 (Debian):
status: Confirmed → Fix Released
Revision history for this message
penalvch (penalvch) wrote :

Bryan, this bug report is being closed due to your last comment regarding this being fixed with an update. For future reference you can manage the status of your own bugs by clicking on the current status in the yellow line and then choosing a new status in the revealed drop down box. You can learn more about bug statuses at https://wiki.ubuntu.com/Bugs/Status. Thank you again for taking the time to report this bug and helping to make Ubuntu better. Please submit any future bugs you may find.

Changed in linux (Ubuntu):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.