QLogic 4052C (qla4xxx) controller does scan the iSCSI bus properly in feisty kernel

Bug #118846 reported by Joe Kislo
10
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
Undecided
Unassigned
linux-source-2.6.20 (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

Binary package hint: linux-source-2.6.20

We have a Qlogic 4052C in a number of servers. It's an iSCSI offloaded, TCP offloaded iSCSI board. Basically it makes your iSCSI targets show up as normal /dev/sda devices.

Before the feisty kernel, we had to compile the qlogic driver from the qlogic website. Everything was good times, everything was very stable. I also had to compile multipath from source aswell, because of a bug in previous versions of multipath (fixed in feisty). I could add my san partitions into /etc/fstab, and everything would work.

  Starting in feisty, the qla4xxx driver has been included in the kernel... But it doesn't work out of the box... and it took me a long time to figure out a workaround.

First, when the qla4xxx driver loads, I get these messages:
[ 75.281797] iscsi: registered transport (qla4xxx)
[ 75.292278] HP CISS Driver (v 3.6.14)
[ 75.305738] ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver
[ 75.336293] eth3: Tigon3 [partno(349321-001) rev 2100 PHY(5704)] (PCIX:133MHz:64-bit) 10/100/1000Base-T Ethernet 00:15:60:aa:3b:e3
[ 75.359891] eth3: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1]
[ 75.376539] eth3: dma_rwctrl[769f4000] dma_mask[64-bit]
[ 75.387043] md: raid10 personality registered for level 10
[ 75.387047] ACPI: PCI Interrupt 0000:05:07.1[B] -> GSI 35 (level, low) -> IRQ 35
[ 75.412826] ACPI: PCI Interrupt 0000:02:04.0[A] -> GSI 24 (level, low) -> IRQ 24
[ 75.427676] qla4xxx 0000:05:07.1: Found an ISP4022, irq 35, iobase 0xffffc2000001c000
[ 75.445972] Losing some ticks... checking if CPU frequency changed.
[ 75.446092] qla4xxx 0000:05:07.1: Configuring PCI space...
[ 75.459721] qla4xxx 0000:05:07.1: Configuring NVRAM ...
[ 75.545962] qla4xxx 0000:05:07.1: Starting firmware ...
[ 75.559089] cciss0: <0x46> at PCI 0000:02:04.0 IRQ 24 using DAC
[ 75.602209] blocks= 142253280 block_size= 512
[ 75.612192] heads=255, sectors=32, cylinders=17433
[ 75.612193]
[ 75.627768] blocks= 142253280 block_size= 512
[ 75.637572] heads=255, sectors=32, cylinders=17433
[ 75.637573]
[ 75.651139] cciss/c0d0: p1 p2 p3
[ 75.661297] ACPI: PCI Interrupt 0000:01:00.0[D] -> GSI 19 (level, low) -> IRQ 19
[ 75.676081] ohci_hcd 0000:01:00.0: OHCI Host Controller
[ 75.686642] ohci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1
[ 75.701399] ohci_hcd 0000:01:00.0: irq 19, io mem 0xf7cf0000
[ 75.774049] usb usb1: configuration #1 chosen from 1 choice
[ 75.785180] hub 1-0:1.0: USB hub found
[ 75.792658] hub 1-0:1.0: 3 ports detected
[ 75.901787] ACPI: PCI Interrupt 0000:01:00.1[D] -> GSI 19 (level, low) -> IRQ 19
[ 75.916575] ohci_hcd 0000:01:00.1: OHCI Host Controller
[ 75.927018] ohci_hcd 0000:01:00.1: new USB bus registered, assigned bus number 2
[ 75.941769] ohci_hcd 0000:01:00.1: irq 19, io mem 0xf7ce0000
[ 76.013673] usb usb2: configuration #1 chosen from 1 choice
[ 76.024808] hub 2-0:1.0: USB hub found
[ 76.032290] hub 2-0:1.0: 3 ports detected
[ 77.489165] qla4xxx 0000:05:07.1: Initializing firmware..
[ 83.968711] qla4xxx 0000:05:07.1: Initializing DDBs ...
[ 84.098501] qla4xxx 0000:05:07.1: DDB[2] MB0 4000 Tot 1 Next 0 State 0004 ConnErr 00000000 10.0.9.1:3260 "iqn.com.xxxxxxx.san-1"
[ 84.158404] qla4xxx 0000:05:07.1: DDB[2] MB0 4000 Tot 1 Next 0 State 0004 ConnErr 00000000 10.0.9.1:3260 "iqn.com.xxxxxxx.san-1"
[ 84.181634] qla4xxx 0000:05:07.1: DDB list done..
[ 84.308206] scsi0 : qla4xxx
[ 84.313881] iscsi: can not broadcast skb (-3)
[ 84.322577] connection0:0: Cannot notify userspace of session creation event. Check iscsi daemon
[ 84.340259] QLogic iSCSI HBA Driver version: 5.00.07-k1
[ 84.340261] QLogic ISP4022 @ 0000:05:07.1, host#=0, fw=02.00.00.45
[ 84.363537] ACPI: PCI Interrupt 0000:05:07.3[D] -> GSI 33 (level, low) -> IRQ 33
[ 84.378367] qla4xxx 0000:05:07.3: Found an ISP4022, irq 33, iobase 0xffffc20000034000
[ 84.394074] qla4xxx 0000:05:07.3: Configuring PCI space...
[ 84.437954] qla4xxx 0000:05:07.3: Configuring NVRAM ...
[ 84.521307] qla4xxx 0000:05:07.3: Starting firmware ...
[ 84.647613] qla4xxx 0000:05:07.3: Initializing firmware..
[ 89.030545] qla4xxx 0000:05:07.3: Initializing DDBs ...
[ 89.160335] qla4xxx 0000:05:07.3: DDB[2] MB0 4000 Tot 1 Next 0 State 0004 ConnErr 00000000 10.0.9.2:3260 "iqn.com.xxxxxxx.san-1"
[ 89.220237] qla4xxx 0000:05:07.3: DDB[2] MB0 4000 Tot 1 Next 0 State 0004 ConnErr 00000000 10.0.9.2:3260 "iqn.com.xxxxxxx.san-1"
[ 89.243464] qla4xxx 0000:05:07.3: DDB list done..
[ 89.370045] scsi1 : qla4xxx
[ 89.375704] iscsi: can not broadcast skb (-3)
[ 89.384390] connection1:0: Cannot notify userspace of session creation event. Check iscsi daemon
[ 89.402091] QLogic iSCSI HBA Driver version: 5.00.07-k1
[ 89.402093] QLogic ISP4022 @ 0000:05:07.3, host#=1, fw=02.00.00.45
[ 89.425353] QLogic iSCSI HBA Driver

After the feisty system boots up, there are no /dev/sda etc. devices created. If I try to use the qlogic CLI, it does not detect any qlogic HBAs in the system:

-------------------------------------------------------------
Current HBA/Port Information:
HBA Ser. Num.: Not Available HBA Port: -1 HBA Alias:
IP Address: Not Available Link: Not Available
Port Name: Not Available
Port Alias: Not Available
-------------------------------------------------------------

 1. Display General System Information
 2. Display Program Version Information
 3. List All QLogic iSCSI HBAs detected
 4. HBA Options Menu
 5. HBA Information
 6. Reset HBA
 7. Target Menu
 8. Import HBA Menu
 9. Diagnostic Menu
10. Display VPD Information
11. Set Working Adapter
12. Help
13. Exit
enter selection: 3
No HBAs Detected in system

Press the Enter key to continue.

This is a suspicious error message that comes out when the qla4xxx driver is being loaded:
[ 247.215485] iscsi: can not broadcast skb (-3)
[ 247.224172] connection2:0: Cannot notify userspace of session creation event. Check iscsi daemon

So first... I don't know why the qla would be contacting a userspace iscsi daemon. I understand you could possibly run this card as a normal ethernet board, and have the kernel do the iSCSI work for you (I don't even know how you would do that with the qla4xxx driver)... but this is a $1200USD board, and it's purpose is to offload everything from the kernel. The kernel shouldn't even know these are iSCSI targets, they should just show up as scsi targets.

So I installed open-iscsi hoping that would eliminate the error messages, and allow the module to fully load (the module DOES insert even with these error messages)... but installing open-iscsi didn't eliminate the errors.

I tried downloading the qlogic driver from their website, but it doesn't compile against the ubuntu feisty kernel...

Since none of the scsi devices were showing up, and the cli doesn't even think there's any HBAs in the system... I assumed things were totally broken with this driver. So I spent alot of time unloading it, loading it in different orders, different parameters... etc. I would say about 50% of the time I rmmod qla4xxx, the system would kernel panic, and I would have to power cycle the box. This seems bad... but I don't know if it's related to the issue at hand.

After a lot of messing around, a lot of stress (the box had to work within about 12 hours), and considering reinstalling dapper or edgy, I finally found a work around:

I created a new startup script, and I added this:

echo "Scanning QLogic Buses"
for scanfile in `find /sys/class/scsi_host/ -name scan`; do echo "- - -" > $scanfile; done

sleep 10

then I can manually fsck the devices, and mount them in the same script.

So this works for now... However it means I can't have any of my SAN devices mounted by ubuntu from /etc/fstab... Which is the way I had it working in dapper.

This machine has also seen ALOT of kernel panics, as described in #118833. Those kernel panics show problems in the I/O scheduler, but perhaps those are actually a symptom of the qla4xxx module being kinda broken? I do not know.

I can be available on IRC to help debug this issue. I am in the US/Eastern timezone. Let me know when to be online and I will be there.

Revision history for this message
Joe Kislo (joe-k12s) wrote :
Download full text (3.9 KiB)

Even with my workaround... the QLA4xxx driver isn't stable. The controller seems to die after the server has been up for a few days:

[490752.920691] qla4xxx 0000:05:07.3: scsi(1:0:2:1): ADAPTER RESET ISSUED.
[490752.943274] qla4xxx 0000:05:07.1: scsi(0:0:2:1): ADAPTER RESET ISSUED.
[490782.168479] qla4xxx 0000:05:07.3: HOST RESET FAILED.
[490782.168487] sd 1:0:2:1: scsi: Device offlined - not ready after error recovery
[490782.168491] sd 1:0:2:1: scsi: Device offlined - not ready after error recovery
[490782.168494] sd 1:0:2:1: scsi: Device offlined - not ready after error recovery
[490782.168496] sd 1:0:2:1: scsi: Device offlined - not ready after error recovery
[490782.168550] sd 1:0:2:1: SCSI error: return code = 0x06000000
[490782.168554] end_request: I/O error, dev sdb, sector 268182896
[490782.168558] device-mapper: multipath: Failing path 8:16.
[490782.168600] sd 1:0:2:1: rejecting I/O to offline device
[490782.179202] sd 1:0:2:1: rejecting I/O to offline device
[490782.189793] sd 1:0:2:1: rejecting I/O to offline device
[490782.200394] sd 1:0:2:1: rejecting I/O to offline device
[490782.210985] sd 1:0:2:1: rejecting I/O to offline device
[490782.221572] sd 1:0:2:1: rejecting I/O to offline device
[490782.232163] sd 1:0:2:1: rejecting I/O to offline device
[490782.242754] sd 1:0:2:1: rejecting I/O to offline device
[490782.253387] sd 1:0:2:1: SCSI error: return code = 0x00010000
[490782.253390] end_request: I/O error, dev sdb, sector 34413984
[490782.253473] sd 1:0:2:1: SCSI error: return code = 0x06000000
[490782.253476] end_request: I/O error, dev sdb, sector 268184216
[490782.253555] sd 1:0:2:1: SCSI error: return code = 0x06000000
[490782.253558] end_request: I/O error, dev sdb, sector 268186264
[490782.253624] sd 1:0:2:1: SCSI error: return code = 0x06000000
[490782.253627] end_request: I/O error, dev sdb, sector 268187296
[490782.254018] qla4xxx 0000:05:07.1: HOST RESET FAILED.
[490782.254022] sd 0:0:2:1: scsi: Device offlined - not ready after error recovery
[490782.254027] sd 0:0:2:1: scsi: Device offlined - not ready after error recovery
[490782.254030] sd 0:0:2:1: scsi: Device offlined - not ready after error recovery
[490782.254033] sd 0:0:2:1: scsi: Device offlined - not ready after error recovery
[490782.254035] sd 0:0:2:1: scsi: Device offlined - not ready after error recovery
[490782.254077] sd 0:0:2:1: SCSI error: return code = 0x06000000
[490782.254080] end_request: I/O error, dev sdf, sector 145109696
[490782.254083] device-mapper: multipath: Failing path 8:80.
[490782.254111] sd 0:0:2:1: rejecting I/O to offline device
[490782.264705] sd 0:0:2:1: rejecting I/O to offline device
[490782.275294] sd 0:0:2:1: rejecting I/O to offline device
[490782.285897] sd 0:0:2:1: rejecting I/O to offline device
[490782.296488] sd 0:0:2:1: rejecting I/O to offline device
[490782.307080] sd 0:0:2:1: rejecting I/O to offline device
[490782.317682] sd 0:0:2:1: rejecting I/O to offline device
[490782.328273] sd 0:0:2:1: rejecting I/O to offline device
[490782.338875] sd 0:0:2:1: rejecting I/O to offline device
[490782.349481] sd 0:0:2:1: rejecting I/O to offline device
[490782.360074] sd 0:0:2:1: rejecting I/O to offline device
[490782...

Read more...

Revision history for this message
Wijnand (wijnand-cyso) wrote :

Hi,

we are having the same issues, not on Ubuntu but on Debian Sarge with a custom compiled 2.6.20 kernel package.
Can I ask you on what hardware your system is running (brand etc) and what kind of SAN you are using?

Revision history for this message
Joe Kislo (joe-k12s) wrote :

The system is a HP DL385. (2 CPU Opteron, Single Core). We're using a promise vtrak M500i for the SAN

Revision history for this message
rdobos (rdobos) wrote :

I'm having the same issues using a ql 4060c on and IBM 3650 server with Fiesty after it updated.
It would still boot, but I couldn't address another lun -
I'm still playing with it trying to get it to boot to the SAN. I have been able to get the HBA running on 32 bit CentOS, but it doesn't address the 9GB ram in the system, so then I used the 64 bit CentOS, but they must compile it different. While it would boot to the SAN, I couldn't connect to the HBA using cli or SANSurfer to configure another lun. I did get the 9GB ram in this one.

QLogic won't talk to me unless I'm running RedHat or Suse ES.

Using cutting edge hardware is fun, isn't it?

Revision history for this message
Joe Kislo (joe-k12s) wrote :

rdobos:

My workaround to echo "- - -" out to the card didn't help? Or are you trying to BOOT from the san?

With the driver that's included with the fiesty kernel, the CLI won't work. It's one of the things that gets broken with the feisty kernel.

Revision history for this message
Joe Kislo (joe-k12s) wrote :

So QLogic's most recent driver on their website:

http://support.qlogic.com/support/EULATemplate/Template.aspx?TemplateID=1&path=http://download.qlogic.com/drivers/60676/qlaiscsi-linux-5.01.00.08-2-install.tgz

will compile against the feisty kernel... and it WILL work properly. It fixed these issues for me:
CLI works again
Bootup properly detects all SAN LUNs
System doesn't randomly kernel panic and crash (so far :) )

So I would highly recommend this driver be included with the gusty kernel. IANAL, I do not know if there are licensing issues, but the driver in the feisty kernel was from qlogic aswell.

All:
If you do download the qlogic driver and install it, here's the procedure I used

tar -xzvf qlaiscsi-linux-5.01.00.08-2-install.tgz
cd 5.01.00.08
extras/build.sh
extras/build.sh install
update-initramfs

*don't forget* the update-initramfs... otherwise you'll use the original driver on bootup

Revision history for this message
Don Faulkner (dfaulkner) wrote :

This remains a problem in Gutsy. I'm installing Xubuntu onto an IBM Bladeserver running a QLogic QMC4052, and I have similar results.
The echo "- - -" trick works, but I'm trying to boot from the SAN. I'm sure this could be made to work (inserting this before the mount -a operation, etc.), but given the erratic response that Joe Kislo reports, I'm reluctant to even try it.

The new QLogic Driver compiles fine on Xubuntu 7.10, so I'll be going that route. It's going to make kernel upgrades a pain, but it's the best option.

Add me to the requests to see this fixed "Real Soon Now(tm)."

Revision history for this message
Don Faulkner (dfaulkner) wrote :

Marking bug as confirmed.

I hope I'm not out of line in doing so. I see that I'm the second commenter on the bug (besides the author). I can confirm the author's observed behavior and workarounds. If that's insufficient, I'd be happy to respond to questions from someone on the appropriate team.

Revision history for this message
Don Faulkner (dfaulkner) wrote :

Marking bug as confirmed.

I hope I'm not out of line in doing so. I see that I'm the second commenter on the bug (besides the author). I can confirm the author's observed behavior and workarounds. If that's insufficient, I'd be happy to respond to questions from someone on the appropriate team. (oops. Guess I should have put that comment here!)

Changed in linux-source-2.6.20:
status: New → Confirmed
Revision history for this message
Don Faulkner (dfaulkner) wrote :

Turns out the 5.00.07 driver doesn't compile directly. It needs a slight patch for kernel header changes. I'm attaching the patch file.

I've also submitted this to the QLogic forums:
http://solutions.qlogic.com/KanisaSupportSite/forum/viewthread.do?command=FRShowThread&threadId=Post-15044086&frStartInd=1&NodeId=FB_HBA_LINUX_2_2&NodeName=iSCSI++Linux&TaxoName=FB_ForumBrowse

Revision history for this message
Don Faulkner (dfaulkner) wrote :

I goofed. there's a bug in the previously attached patch. A new patch file (same great name, now with fewer bugs!) is attached.

Revision history for this message
Alfredo Deza (arufuredosan) wrote :

We have changed the build.sh script in the newest qlogic driver to get rid of:

"pci_driver_init"

And replaced that with:

"pci_register_driver"

The reason for this (as it seems is the same deal with Don Faulkner's patch) is because that command was for kernels older than 2.6.20.

Revision history for this message
Alfredo Deza (arufuredosan) wrote :

Correction... we changed the "ql4_os.c" file.

Commented out this line:
        /* status = pci_module_init(&qla4xxx_pci_driver); */

To read this:
        status = pci_register_driver(&qla4xxx_pci_driver);

Revision history for this message
Joe Kislo (joe-k12s) wrote :

So if you try upgrading to hardy... this issue is still not resolved. You'll still need to download the driver from qlogic and install it per my earlier instructions... However!

Good luck getting it to compile. I spent 30 minutes looking through kernel commits to get it to compile, but was able to merge in most of the patch from somebody else against an older version of the driver. (Somebody named dcarley on the qlogic forum)

This patch applies against 5.01.01.04 only. Go to qlogic and download this newer driver, then apply my patch.

The driver has held up to 10 minutes of heavy I/O so far without issue. Most of the changes in the patch are minor, so I am assuming it will be stable. (jinx)

Revision history for this message
Launchpad Janitor (janitor) wrote : This bug is now reported against the 'linux' package

Beginning with the Hardy Heron 8.04 development cycle, all open Ubuntu kernel bugs need to be reported against the "linux" kernel package. We are automatically migrating this bug to the new "linux" package. However, development has already began for the upcoming Intrepid Ibex 8.10 release. It would be helpful if you could test the upcoming release and verify if this is still an issue - http://www.ubuntu.com/testing . If the issue still exists, please update this report by changing the Status of the "linux" task from "Incomplete" to "New". We appreciate your patience and understanding as we make this transition. Thanks!

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

*This is an automated response*

This bug report is being closed because we received no response to the previous request for information. Please reopen this if it is still an issue in the actively developed pre-release of Jaunty Jackalope 9.04 - http://cdimage.ubuntu.com/releases/jaunty . To reopen the bug report simply change the Status of the "linux" task back to "New".

Changed in linux:
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.