QLogic 4052C (qla4xxx) controller does scan the iSCSI bus properly in feisty kernel
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Won't Fix
|
Undecided
|
Unassigned | ||
linux-source-2.6.20 (Ubuntu) |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
Binary package hint: linux-source-2.6.20
We have a Qlogic 4052C in a number of servers. It's an iSCSI offloaded, TCP offloaded iSCSI board. Basically it makes your iSCSI targets show up as normal /dev/sda devices.
Before the feisty kernel, we had to compile the qlogic driver from the qlogic website. Everything was good times, everything was very stable. I also had to compile multipath from source aswell, because of a bug in previous versions of multipath (fixed in feisty). I could add my san partitions into /etc/fstab, and everything would work.
Starting in feisty, the qla4xxx driver has been included in the kernel... But it doesn't work out of the box... and it took me a long time to figure out a workaround.
First, when the qla4xxx driver loads, I get these messages:
[ 75.281797] iscsi: registered transport (qla4xxx)
[ 75.292278] HP CISS Driver (v 3.6.14)
[ 75.305738] ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver
[ 75.336293] eth3: Tigon3 [partno(349321-001) rev 2100 PHY(5704)] (PCIX:133MHz:
[ 75.359891] eth3: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1]
[ 75.376539] eth3: dma_rwctrl[
[ 75.387043] md: raid10 personality registered for level 10
[ 75.387047] ACPI: PCI Interrupt 0000:05:07.1[B] -> GSI 35 (level, low) -> IRQ 35
[ 75.412826] ACPI: PCI Interrupt 0000:02:04.0[A] -> GSI 24 (level, low) -> IRQ 24
[ 75.427676] qla4xxx 0000:05:07.1: Found an ISP4022, irq 35, iobase 0xffffc2000001c000
[ 75.445972] Losing some ticks... checking if CPU frequency changed.
[ 75.446092] qla4xxx 0000:05:07.1: Configuring PCI space...
[ 75.459721] qla4xxx 0000:05:07.1: Configuring NVRAM ...
[ 75.545962] qla4xxx 0000:05:07.1: Starting firmware ...
[ 75.559089] cciss0: <0x46> at PCI 0000:02:04.0 IRQ 24 using DAC
[ 75.602209] blocks= 142253280 block_size= 512
[ 75.612192] heads=255, sectors=32, cylinders=17433
[ 75.612193]
[ 75.627768] blocks= 142253280 block_size= 512
[ 75.637572] heads=255, sectors=32, cylinders=17433
[ 75.637573]
[ 75.651139] cciss/c0d0: p1 p2 p3
[ 75.661297] ACPI: PCI Interrupt 0000:01:00.0[D] -> GSI 19 (level, low) -> IRQ 19
[ 75.676081] ohci_hcd 0000:01:00.0: OHCI Host Controller
[ 75.686642] ohci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1
[ 75.701399] ohci_hcd 0000:01:00.0: irq 19, io mem 0xf7cf0000
[ 75.774049] usb usb1: configuration #1 chosen from 1 choice
[ 75.785180] hub 1-0:1.0: USB hub found
[ 75.792658] hub 1-0:1.0: 3 ports detected
[ 75.901787] ACPI: PCI Interrupt 0000:01:00.1[D] -> GSI 19 (level, low) -> IRQ 19
[ 75.916575] ohci_hcd 0000:01:00.1: OHCI Host Controller
[ 75.927018] ohci_hcd 0000:01:00.1: new USB bus registered, assigned bus number 2
[ 75.941769] ohci_hcd 0000:01:00.1: irq 19, io mem 0xf7ce0000
[ 76.013673] usb usb2: configuration #1 chosen from 1 choice
[ 76.024808] hub 2-0:1.0: USB hub found
[ 76.032290] hub 2-0:1.0: 3 ports detected
[ 77.489165] qla4xxx 0000:05:07.1: Initializing firmware..
[ 83.968711] qla4xxx 0000:05:07.1: Initializing DDBs ...
[ 84.098501] qla4xxx 0000:05:07.1: DDB[2] MB0 4000 Tot 1 Next 0 State 0004 ConnErr 00000000 10.0.9.1:3260 "iqn.com.
[ 84.158404] qla4xxx 0000:05:07.1: DDB[2] MB0 4000 Tot 1 Next 0 State 0004 ConnErr 00000000 10.0.9.1:3260 "iqn.com.
[ 84.181634] qla4xxx 0000:05:07.1: DDB list done..
[ 84.308206] scsi0 : qla4xxx
[ 84.313881] iscsi: can not broadcast skb (-3)
[ 84.322577] connection0:0: Cannot notify userspace of session creation event. Check iscsi daemon
[ 84.340259] QLogic iSCSI HBA Driver version: 5.00.07-k1
[ 84.340261] QLogic ISP4022 @ 0000:05:07.1, host#=0, fw=02.00.00.45
[ 84.363537] ACPI: PCI Interrupt 0000:05:07.3[D] -> GSI 33 (level, low) -> IRQ 33
[ 84.378367] qla4xxx 0000:05:07.3: Found an ISP4022, irq 33, iobase 0xffffc20000034000
[ 84.394074] qla4xxx 0000:05:07.3: Configuring PCI space...
[ 84.437954] qla4xxx 0000:05:07.3: Configuring NVRAM ...
[ 84.521307] qla4xxx 0000:05:07.3: Starting firmware ...
[ 84.647613] qla4xxx 0000:05:07.3: Initializing firmware..
[ 89.030545] qla4xxx 0000:05:07.3: Initializing DDBs ...
[ 89.160335] qla4xxx 0000:05:07.3: DDB[2] MB0 4000 Tot 1 Next 0 State 0004 ConnErr 00000000 10.0.9.2:3260 "iqn.com.
[ 89.220237] qla4xxx 0000:05:07.3: DDB[2] MB0 4000 Tot 1 Next 0 State 0004 ConnErr 00000000 10.0.9.2:3260 "iqn.com.
[ 89.243464] qla4xxx 0000:05:07.3: DDB list done..
[ 89.370045] scsi1 : qla4xxx
[ 89.375704] iscsi: can not broadcast skb (-3)
[ 89.384390] connection1:0: Cannot notify userspace of session creation event. Check iscsi daemon
[ 89.402091] QLogic iSCSI HBA Driver version: 5.00.07-k1
[ 89.402093] QLogic ISP4022 @ 0000:05:07.3, host#=1, fw=02.00.00.45
[ 89.425353] QLogic iSCSI HBA Driver
After the feisty system boots up, there are no /dev/sda etc. devices created. If I try to use the qlogic CLI, it does not detect any qlogic HBAs in the system:
-------
Current HBA/Port Information:
HBA Ser. Num.: Not Available HBA Port: -1 HBA Alias:
IP Address: Not Available Link: Not Available
Port Name: Not Available
Port Alias: Not Available
-------
1. Display General System Information
2. Display Program Version Information
3. List All QLogic iSCSI HBAs detected
4. HBA Options Menu
5. HBA Information
6. Reset HBA
7. Target Menu
8. Import HBA Menu
9. Diagnostic Menu
10. Display VPD Information
11. Set Working Adapter
12. Help
13. Exit
enter selection: 3
No HBAs Detected in system
Press the Enter key to continue.
This is a suspicious error message that comes out when the qla4xxx driver is being loaded:
[ 247.215485] iscsi: can not broadcast skb (-3)
[ 247.224172] connection2:0: Cannot notify userspace of session creation event. Check iscsi daemon
So first... I don't know why the qla would be contacting a userspace iscsi daemon. I understand you could possibly run this card as a normal ethernet board, and have the kernel do the iSCSI work for you (I don't even know how you would do that with the qla4xxx driver)... but this is a $1200USD board, and it's purpose is to offload everything from the kernel. The kernel shouldn't even know these are iSCSI targets, they should just show up as scsi targets.
So I installed open-iscsi hoping that would eliminate the error messages, and allow the module to fully load (the module DOES insert even with these error messages)... but installing open-iscsi didn't eliminate the errors.
I tried downloading the qlogic driver from their website, but it doesn't compile against the ubuntu feisty kernel...
Since none of the scsi devices were showing up, and the cli doesn't even think there's any HBAs in the system... I assumed things were totally broken with this driver. So I spent alot of time unloading it, loading it in different orders, different parameters... etc. I would say about 50% of the time I rmmod qla4xxx, the system would kernel panic, and I would have to power cycle the box. This seems bad... but I don't know if it's related to the issue at hand.
After a lot of messing around, a lot of stress (the box had to work within about 12 hours), and considering reinstalling dapper or edgy, I finally found a work around:
I created a new startup script, and I added this:
echo "Scanning QLogic Buses"
for scanfile in `find /sys/class/
sleep 10
then I can manually fsck the devices, and mount them in the same script.
So this works for now... However it means I can't have any of my SAN devices mounted by ubuntu from /etc/fstab... Which is the way I had it working in dapper.
This machine has also seen ALOT of kernel panics, as described in #118833. Those kernel panics show problems in the I/O scheduler, but perhaps those are actually a symptom of the qla4xxx module being kinda broken? I do not know.
I can be available on IRC to help debug this issue. I am in the US/Eastern timezone. Let me know when to be online and I will be there.
Even with my workaround... the QLA4xxx driver isn't stable. The controller seems to die after the server has been up for a few days:
[490752.920691] qla4xxx 0000:05:07.3: scsi(1:0:2:1): ADAPTER RESET ISSUED.
[490752.943274] qla4xxx 0000:05:07.1: scsi(0:0:2:1): ADAPTER RESET ISSUED.
[490782.168479] qla4xxx 0000:05:07.3: HOST RESET FAILED.
[490782.168487] sd 1:0:2:1: scsi: Device offlined - not ready after error recovery
[490782.168491] sd 1:0:2:1: scsi: Device offlined - not ready after error recovery
[490782.168494] sd 1:0:2:1: scsi: Device offlined - not ready after error recovery
[490782.168496] sd 1:0:2:1: scsi: Device offlined - not ready after error recovery
[490782.168550] sd 1:0:2:1: SCSI error: return code = 0x06000000
[490782.168554] end_request: I/O error, dev sdb, sector 268182896
[490782.168558] device-mapper: multipath: Failing path 8:16.
[490782.168600] sd 1:0:2:1: rejecting I/O to offline device
[490782.179202] sd 1:0:2:1: rejecting I/O to offline device
[490782.189793] sd 1:0:2:1: rejecting I/O to offline device
[490782.200394] sd 1:0:2:1: rejecting I/O to offline device
[490782.210985] sd 1:0:2:1: rejecting I/O to offline device
[490782.221572] sd 1:0:2:1: rejecting I/O to offline device
[490782.232163] sd 1:0:2:1: rejecting I/O to offline device
[490782.242754] sd 1:0:2:1: rejecting I/O to offline device
[490782.253387] sd 1:0:2:1: SCSI error: return code = 0x00010000
[490782.253390] end_request: I/O error, dev sdb, sector 34413984
[490782.253473] sd 1:0:2:1: SCSI error: return code = 0x06000000
[490782.253476] end_request: I/O error, dev sdb, sector 268184216
[490782.253555] sd 1:0:2:1: SCSI error: return code = 0x06000000
[490782.253558] end_request: I/O error, dev sdb, sector 268186264
[490782.253624] sd 1:0:2:1: SCSI error: return code = 0x06000000
[490782.253627] end_request: I/O error, dev sdb, sector 268187296
[490782.254018] qla4xxx 0000:05:07.1: HOST RESET FAILED.
[490782.254022] sd 0:0:2:1: scsi: Device offlined - not ready after error recovery
[490782.254027] sd 0:0:2:1: scsi: Device offlined - not ready after error recovery
[490782.254030] sd 0:0:2:1: scsi: Device offlined - not ready after error recovery
[490782.254033] sd 0:0:2:1: scsi: Device offlined - not ready after error recovery
[490782.254035] sd 0:0:2:1: scsi: Device offlined - not ready after error recovery
[490782.254077] sd 0:0:2:1: SCSI error: return code = 0x06000000
[490782.254080] end_request: I/O error, dev sdf, sector 145109696
[490782.254083] device-mapper: multipath: Failing path 8:80.
[490782.254111] sd 0:0:2:1: rejecting I/O to offline device
[490782.264705] sd 0:0:2:1: rejecting I/O to offline device
[490782.275294] sd 0:0:2:1: rejecting I/O to offline device
[490782.285897] sd 0:0:2:1: rejecting I/O to offline device
[490782.296488] sd 0:0:2:1: rejecting I/O to offline device
[490782.307080] sd 0:0:2:1: rejecting I/O to offline device
[490782.317682] sd 0:0:2:1: rejecting I/O to offline device
[490782.328273] sd 0:0:2:1: rejecting I/O to offline device
[490782.338875] sd 0:0:2:1: rejecting I/O to offline device
[490782.349481] sd 0:0:2:1: rejecting I/O to offline device
[490782.360074] sd 0:0:2:1: rejecting I/O to offline device
[490782...