Ubuntu Server 18.04 LTS aacraid error

Bug #1777586 reported by Patrick Storms on 2018-06-19
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned
Bionic
High
Unassigned

Bug Description

I upgraded from a previous version of Ubuntu 14.04LTS to 18.04LTS and am now running into these raid adapter driver errors. The server ran fine in older version. My apologies as I lost the exact version, but it never had any errors like this version.

Now when ever I try to copy files to the RAID 5 drive, or untar a file, I get these errors now after a few MB's of written data.

Linux batboat 4.15.0-23-generic #25-Ubuntu SMP Wed May 23 18:02:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

batboat:/var/log$ lsb_release -rd
Description: Ubuntu 18.04 LTS
Release: 18.04

I have tried the IRQ debugging tips to no avail. I loaded in Debian9.4.0 and it only briefly showed this error once. But appears to be much more resilient and appears to work fine.

Jun 19 00:02:21 batboat kernel: [ 498.770839] aacraid: Host adapter reset request. SCSI hang ?
Jun 19 00:02:37 batboat kernel: [ 514.139167] aacraid: Host adapter reset request. SCSI hang ?
Jun 19 00:02:37 batboat kernel: [ 514.795083] aacraid 0000:03:09.0: Adapter health - 199
Jun 19 00:02:37 batboat kernel: [ 514.800376] aacraid 0000:03:09.0: outstanding cmd: midlevel-0
Jun 19 00:02:37 batboat kernel: [ 514.800378] aacraid 0000:03:09.0: outstanding cmd: lowlevel-0
Jun 19 00:02:37 batboat kernel: [ 514.800381] aacraid 0000:03:09.0: outstanding cmd: error handler-0
Jun 19 00:02:37 batboat kernel: [ 514.800383] aacraid 0000:03:09.0: outstanding cmd: firmware-5
Jun 19 00:02:37 batboat kernel: [ 514.800385] aacraid 0000:03:09.0: outstanding cmd: kernel-0
Jun 19 00:02:37 batboat kernel: [ 514.800391] sd 4:0:0:0: Device offlined - not ready after error recovery
Jun 19 00:02:37 batboat kernel: [ 514.800394] sd 4:0:0:0: Device offlined - not ready after error recovery
Jun 19 00:02:37 batboat kernel: [ 514.800396] sd 4:0:0:0: Device offlined - not ready after error recovery
Jun 19 00:02:37 batboat kernel: [ 514.800399] sd 4:0:0:0: Device offlined - not ready after error recovery
Jun 19 00:02:37 batboat kernel: [ 514.800401] sd 4:0:0:0: Device offlined - not ready after error recovery

Patrick Storms (pstorms) wrote :
Patrick Storms (pstorms) wrote :

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1777586

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.17 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.18-rc1

Changed in linux (Ubuntu):
importance: Undecided → High
Joseph Salisbury (jsalisbury) wrote :

Also, does this bug go away if you select the prior kernel version from the GRUB menu?

Changed in linux (Ubuntu Bionic):
importance: Undecided → High
status: New → Incomplete
tags: added: kernel-da-key
Patrick Storms (pstorms) wrote :

I have decided to try to eliminate some things. I ended up formatting my RAID drives and cleared that out. I also rebuilt the RAID Array in hopes of clearing up this issue. But these did not fix what we are seeing here.

As to your request regarding selecting versions from the GRUB menu, I tried the two versions of the kernel that I have installed on this server.

The two kernels are:
4.15.0-20-generic
4.15.0-23-generic

This is an interesting development. When I booted the earlier version (4.15.0-20), I do not get any errors. I tested several large copies to a USB drive with no issues. 6 Times I tried to create the issue, but was not successful. When I booted back to the new version(4.15.0-23), it errors almost immediately. When I went and rebooted back to 4.15.0-20 it errored as well. So the issue is systemic in both releases.

I will try the updated kernel next.

Patrick Storms (pstorms) wrote :

So I downloaded the new kernel as you requested. I have 4.17.2-041702-generic installed and booted. After loading the new kernel, I ran the same tests as before where it would fail with the aacraid errors. And it is exhibiting the same behaviour.

I mounted the USB drive at Jun 20 00:46:55 in the attached syslog file. I then copied from the RAID drive to the USB drive and the errors showed up again.

Patrick Storms (pstorms) wrote :

Another note, if I let the system run as a Web Server/eMail server there are no issues. I don't see any errors in the log files. The issue only arises when there is heavy I/O to the Hardware RAID controller. The Software RAID works fine as my boot drives are mirrored SSD's. The DATA drive is the RAID 5 Adaptec controller. 3 1TB Drives. Thanks for the help with this.

Patrick Storms (pstorms) wrote :

For giggles I downloaded the latest kernel available today. 4.18.RC1 to try. And it too exhibits the same thing. No change.

Patrick Storms (pstorms) wrote :

For more information, I am now regressing on kernels. I tried 4.14.50 and it too errors.

Patrick Storms (pstorms) wrote :

I have now went back to 4.13.16 as a test and it too is exhibiting the same behaviour. I think I have chased enough kernels for now. If there is anything else you need, please let me know. Thank you for your help in this matter. It takes approximately 283 seconds before the first error message popped up on the "aacraid: Host adapter abort request".

Patrick Storms (pstorms) wrote :

I am unable to run the requested command as requested by the automated kernel bot. apport-collect 1777586. I am setting the defect case to confirmed as requested.

Changed in linux (Ubuntu Bionic):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Daniel Reinhardt (cryptodan) wrote :
Download full text (4.0 KiB)

this bug goes all the way back to centos 5 and kernel 2.6.

i have a stable machine on the following system:

cryptodan@capricorn:~$ inxi -Fxxxrpc0
System: Host: capricorn Kernel: 3.13.0-24-generic i686 (32 bit, gcc: 4.8.2) Console: tty 1 Distro: Ubuntu 14.04 trusty
Machine: System: Dell product: PowerEdge 4600 Chassis: type: 17
           Mobo: Dell model: 0H3009 version: A00 Bios: Dell version: A13 date: 10/21/2004
CPU(s): 2 Single core Intel Xeon CPUs (-HT-SMP-) cache: 1024 KB flags: (pae sse sse2) bmips: 11961.4
           Clock Speeds: 1: 2990.346 MHz 2: 2990.346 MHz 3: 2990.346 MHz 4: 2990.346 MHz
Graphics: Card: Advanced Micro Devices [AMD/ATI] Rage XL PCI bus-ID: 00:0e.0 chip-ID: 1002:4752
           X-Vendor: N/A driver: N/A tty size: 100x35 Advanced Data: N/A out of X
Network: Card-1: Intel 82557/8/9/0/1 Ethernet Pro 100
           driver: e100 ver: 3.5.24-k2-NAPI port: e8c0 bus-ID: 00:08.0 chip-ID: 8086:1229
           IF: eth2 state: down mac: 00:02:b3:4b:1b:d9
           Card-2: Intel 82546EB Gigabit Ethernet Controller (Copper)
           driver: e1000 ver: 7.3.21-k8-NAPI port: bcc0 bus-ID: 08:06.0 chip-ID: 8086:1010
           IF: eth0 state: down mac: 00:04:23:d0:b5:e2
           Card-3: Intel 82546EB Gigabit Ethernet Controller (Copper)
           driver: e1000 ver: 7.3.21-k8-NAPI port: bc80 bus-ID: 08:06.1 chip-ID: 8086:1010
           IF: eth1 state: up speed: 1000 Mbps duplex: full mac: 00:04:23:d0:b5:e3
Drives: HDD Total Size: 2099.6GB (0.1% used)
           1: id: /dev/sda model: system size: 300.0GB serial: 8EDB485F temp: 0C
           2: id: /dev/sdb model: homepart size: 1799.6GB serial: 326F485F temp: 0C
Partition: ID: / size: 92G used: 377M (1%) fs: ext4 ID: /boot size: 922M used: 35M (5%) fs: ext4
           ID: /usr size: 92G used: 745M (1%) fs: ext4 ID: /var size: 69G used: 527M (1%) fs: ext4
           ID: /home size: 1.7T used: 69M (1%) fs: ext4 ID: swap-1 size: 24.00GB used: 0.00GB (0%) fs: swap
RAID: System: supported: N/A
           No RAID devices detected - /proc/mdstat and md_mod kernel raid module present
           Unused Devices: none
Sensors: None detected - is lm-sensors installed and configured?
Repos: Active apt sources in file: /etc/apt/sources.list
           deb http://us.archive.ubuntu.com/ubuntu/ trusty main restricted
           deb-src http://us.archive.ubuntu.com/ubuntu/ trusty main restricted
           deb http://us.archive.ubuntu.com/ubuntu/ trusty-updates main restricted
           deb-src http://us.archive.ubuntu.com/ubuntu/ trusty-updates main restricted
           deb http://us.archive.ubuntu.com/ubuntu/ trusty universe
           deb-src http://us.archive.ubuntu.com/ubuntu/ trusty universe
           deb http://us.archive.ubuntu.com/ubuntu/ trusty-updates universe
           deb-src http://us.archive.ubuntu.com/ubuntu/ trusty-updates universe
           deb http://us.archive.ubuntu.com/ubuntu/ trusty multiverse
           deb-src http://us.archive.ubuntu.com/ubuntu/ trusty multiverse
           deb http://us.archive.ubuntu.com/ubuntu/ trusty-updates multiverse
           deb-src http://us.archive.ubuntu.com/ubuntu/ trusty-updates multiverse
           deb ...

Read more...

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers