udlfd video driver hangs " BUG: scheduling while atomic " iscsi booted initrd : (trusty)

Bug #1279403 reported by john.d@servergy.com
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
High
Unassigned
linux-ppc (Ubuntu)
Invalid
High
Unassigned

Bug Description

Testing saucy iscsi target booted on e500mc default initrd hangs when tg3 driver started.

saucy was installed to an iSCSI target using the interactive setup from:

   http://ports.ubuntu.com/dists/saucy/main/installer-powerpc/current/images/e500mc/netboot/vmlinux
   http://ports.ubuntu.com/dists/saucy/main/installer-powerpc/current/images/e500mc/netboot/initrd.gz

Linux version 3.11.0-5-powerpc-e500mc
initrd.img-3.11.0-5-powerpc-e500mc

Booted from flash USB to simulate a pxe boot.

svy-boot#
svy-boot# mount /dev/sda1 /mnt
svy-boot# cd /mnt/iscsi2
svy-boot# ./go.sh
 /sbin/kexec --command='root=/dev/ram ro UUID=1cfe9f29-e188-4e00-ac47-3bf2245b117f ro break debug interface=eth4 console=tty0 console=ttyS0,115200 dhcp ISCSI_INITIATOR=iqn.1993-08.org.debian:01:53694d75a62e ISCSI_TARGET_NAME=iqn.2001-04.com.jade:storage.disk0 ISCSI_TARGET_IP=192.168.1.140 ISCSI_TARGET_PORT=3260 ISCSI_TARGET_GROUP=1 ' --initrd=/mnt/iscsi2/initrd.img-3.11.0-5-powerpc-e500mc -l /mnt/iscsi2/vmlinux-3.11.0-5-powerpc-e500mc

with break enabled I can enter cli in the initramfs

     -> This is embedded fsl eth1 ( 1Gbe ) with a cable :

(initramfs) ipconfig -t 10 -c dhcp -d eth1 &
(initramfs) IP-Config: eth1 hardware address 02:30:b0:00:10:01 mtu 1500 DHCP
IP-Config: no response after 10 secs - giving up

   -> System is alive ( I have a prompt ) - The above SHOULD HAVE WORKED since it is an active port.

This is embedded fsl eth1 - (10Gbe) - No cable

(initramfs) ipconfig -t 10 -c dhcp -d eth0 &
(initramfs)
(initramfs) IP-Config: eth0 hardware address 02:30:b0:00:10:00 mtu 1500 DHCP
IP-Config: no response after 10 secs - giving up

  -> System is alive ( I have a prompt)

            tg3 stand up card in pci-e slot:

(initramfs) ipconfig -t 10 -c dhcp -d eth4 &
(initramfs)
(initramfs) [ 218.394187] IPv6: ADDRCONF(NETDEV_UP): eth4: link is not ready
(initramfs)
(initramfs) [ 221.268533] tg3 0000:01:00.1 eth4: Link is up at 1000 Mbps, full duplex

< hung >

iSCSI target information:

Server: CentOS 6.4 .x64 .

#passwd setup
#
# tgtadm --lld iscsi --op new --mode account --user ronnie --password password
# tgtadm --lld iscsi --op bind --mode account --tid 1 --user ronnie
#
# dd if=/dev/zero of=iscsi.disk0 bs=1024M count=10

tgtadm --lld iscsi --op delete --mode target --tid 1

tgtadm --lld iscsi --op new --mode target --tid 1 -T iqn.2001-04.com.jade:storage.disk0
tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 -b `pwd`/iscsi.disk0

tgtadm --lld iscsi --op bind --mode target --tid 1 -I ALL
tgtadm --lld iscsi --op show --mode target

Revision history for this message
john.d@servergy.com (john-d-0) wrote :

Originally reportedas defect 1276298.

Revision history for this message
john.d@servergy.com (john-d-0) wrote :

retested with Trusty kernel 3-13-0-24 by doing a direct install to a iSCSI target;

 Stiil hangs ;

summary: - tg3 driver hangs in iscsi booted initrd : (saucy)
+ tg3 driver hangs in iscsi booted initrd : (trusty)
Revision history for this message
john.d@servergy.com (john-d-0) wrote : Re: tg3 driver hangs in iscsi booted initrd : (trusty)

I enabled kgdboc=ttyS0,115200 and caught this :

[ 71.145241] input: Barcode Reader as /devices/ffe000000.soc/ffe210000.usb/fsl-ehci.0/usb1/1-1/1-1.3/1-1.3:1.1/input/input1
[ 71.145493] hid-generic 0003:13BA:0018.0002: input,hidraw1: USB HID v1.10 Mouse [Barcode Reader ] on usb-fsl-ehci.0-1.3/input1
[ 71.206233] md: raid6 personality registered for level 6
[ 71.206235] md: raid5 personality registered for level 5
[ 71.206236] md: raid4 personality registered for level 4
[ 71.260531] md: raid10 personality registered for level 10
[ 71.664609] udlfb: DisplayLink USB device /dev/fb0 attached. 1440x900 resolution. Using 5064K framebuffer memory
[ 71.934176] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready
[ 72.083988] IPv6: ADDRCONF(NETDEV_UP): eth3: link is not ready
[ 73.549288] tg3 0000:01:00.0 eth2: Link is up at 100 Mbps, full duplex

Entering kdb (current=0xe62b8000, pid 74) on processor 1 Oops: (null)
due to oops @ 0x0
dCPU: 1 PID: 74 Comm: kworker/1:1 Tainted: G W 3.13.9+ #3
dWorkqueue: events linkwatch_event
dtask: e62b8000 ti: e6396000 task.ti: e6396000
NIP: 00000000 LR: c00885ac CTR: 00000000
REGS: e6397ca0 TRAP: 0400 Tainted: G W (3.13.9+)
MSR: 00021002 <CE,ME> CR: 22ad2e28 XER: 00000000

GPR00: c00885ac e6397d50 e62b8000 d1826900 c0b0b340 00000005 448130bc 00000011
GPR08: 00000000 00000000 00000011 204c2f00 00000000 00000000 c0078250 e62c7920
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 d1826680
GPR24: c0b60000 10d20000 c0b06900 c0b06900 c0b80e18 e6396000 d1826900 c0b0b340
NIP [00000000] (null)
LR [c00885ac] ttwu_do_activate.constprop.70+0x4c/0x90
Call Trace:
[e6397d50] [c00885ac] ttwu_do_activate.constprop.70+0x4c/0x90 (unreliable)
[e6397d60] [c0089d80] scheduler_ipi+0xb0/0x1e0
[e6397d90] [c00126d0] smp_ipi_demux+0xa0/0xf0
[e6397db0] [c0010250] doorbell_exception+0x70/0xa0
[e6397dd0] [c0010bb0] ret_from_except_full+0x0/0x4c
--- Exception: 2070 at linkwatch_event+0x0/0x50
    LR = process_one_work+0x14c/0x3e0
[e6397e90] [c006fef8] process_one_work+0x128/0x3e0 (unreliable)
[e6397ec0] [c00708d4] worker_thread+0x134/0x3e0
[e6397ef0] [c0078324] kthread+0xd4/0xf0
[e6397f40] [c001068c] ret_from_kernel_thread+0x5c/0x64

Revision history for this message
john.d@servergy.com (john-d-0) wrote :
Download full text (3.8 KiB)

Seems the video driver calling timeout in the wrong context is the root cause:

>[ 45.638269] Call Trace:
<4>[ 45.638273] [e69a38a0] [c00080bc] show_stack+0xfc/0x1c0 (unreliable)
<4>[ 45.638277] [e69a38f0] [c07fd050] dump_stack+0x78/0xa0
<4>[ 45.638285] [e69a3900] [c008eb50] dequeue_task_idle+0x40/0x60
<4>[ 45.638288] [e69a3910] [c07ebc7c] __schedule+0x4dc/0x780
<4>[ 45.638292] [e69a3a20] [c07eaf64] schedule_timeout+0x144/0x260
<4>[ 45.638295] [e69a3a80] [c07ee368] __down_timeout+0x78/0xc0
<4>[ 45.638299] [e69a3ab0] [c00a1834] down_timeout+0x74/0x80
<4>[ 45.638303] [e69a3ad0] [c0439f0c] dlfb_get_urb+0x2c/0xc0
<4>[ 45.638306] [e69a3af0] [c043b5e4] dlfb_handle_damage.isra.3+0xa4/0x1f0
<4>[ 45.638309] [e69a3b30] [c0419934] soft_cursor+0x1b4/0x2a0
<4>[ 45.638313] [e69a3b60] [c041935c] bit_cursor+0x52c/0x550
<4>[ 45.638316] [e69a3bf0] [c0415400] fbcon_cursor+0x130/0x1b0
<4>[ 45.638319] [e69a3c20] [c046f098] vt_console_print+0x248/0x4a0
<4>[ 45.638322] [e69a3c70] [c00a6ab0] call_console_drivers.constprop.18+0xf0/0x170
<4>[ 45.638325] [e69a3c90] [c00a73a4] console_unlock+0x474/0x4a0
<4>[ 45.638329] [e69a3cc0] [c00a76c4] vprintk_emit+0x2f4/0x500
<4>[ 45.638332] [e69a3d10] [c0499018] dev_vprintk_emit+0x58/0x70
<4>[ 45.638336] [e69a3db0] [c0499084] dev_printk_emit+0x54/0x70
<4>[ 45.638339] [e69a3de0] [c06db0a4] netdev_info+0x74/0x90
<4>[ 45.638346] [e69a3e20] [f1c6ead8] tg3_link_report+0xb8/0x170 [tg3]
<4>[ 45.638351] [e69a3e30] [f1c6f888] tg3_test_and_report_link_chg+0x58/0xa0 [tg3]
<4>[ 45.638358] [e69a3e40] [f1c79ef4] tg3_setup_phy+0xf74/0x2450 [tg3]
<4>[ 45.638364] [e69a3ee0] [f1c7f6b8] tg3_poll+0x398/0x430 [tg3]
<4>[ 45.638367] [e69a3f30] [c06d9954] net_rx_action+0x144/0x250
<4>[ 45.638370] [e69a3f80] [c0055758] __do_softirq+0x118/0x2b0
<4>[ 45.638374] [e69a3fe0] [c0055d04] irq_exit+0xb4/0xf0
<4>[ 45.638377] [e69a3ff0] [c000e8d0] call_do_irq+0x24/0x3c
<4>[ 45.638380] [c0b75e90] [c00054c8] do_IRQ+0x98/0x110
<4>[ 45.638383] [c0b75eb0] [c0010bfc] ret_from_except+0x0/0x18
<4>[ 45.638388] --- Exception: 501 at arch_cpu_idle+0x30/0x80
<4>[ 45.638388] LR = arch_cpu_idle+0x30/0x80
<4>[ 45.638392] [c0b75f70] [c00b5bd8] rcu_idle_enter+0xb8/0x100 (unreliable)
<4>[ 45.638395] [c0b75f80] [c00a93a0] cpu_startup_entry+0x160/0x260
<4>[ 45.638398] [c0b75fc0] [c0a887ec] start_kernel+0x33c/0x350
<4>[ 45.638401] [c0b75ff0] [c00003fc] skpinv+0x2e8/0x324
<1>[ 45.638412] Unable to handle kernel paging request for instruction fetch
<1>[ 45.638413] Faulting instruction address: 0x00000000
[dumpcommon]kdb> -bt

Stack traceback for pid 74
0xe62b8000 74 2 1 1 R 0xe62b8340 *kworker/1:1
Call Trace:
[e63978e0] [c00080bc] show_stack+0xfc/0x1c0 (unreliable)
[e6397930] [c00fcb28] kdb_show_stack+0x88/0xb0
[e6397950] [c00fcc04] kdb_bt1.isra.0+0xb4/0x140
[e6397980] [c00fd010] kdb_bt+0x380/0x450
[e6397a10] [c00f9c50] kdb_parse+0x270/0x720
[e6397a60] [c00fa1d4] kdb_exec_defcmd+0xd4/0x140
[e6397a80] [c00f9c50] kdb_parse+0x270/0x720
[e6397ad0] [c00fa7b8] kdb_main_loop+0x348/0x7c0
[e6397b30] [c00fdee8] kdb_stub+0x248/0x4c0
[e6397b70] [c00f2a10] kgdb_cpu_enter+0x3c0/0x6a0
[e6397bd0] [c00...

Read more...

Revision history for this message
john.d@servergy.com (john-d-0) wrote :

i back ported a fix from Alexander Holler:

The console functions are using spinlocks while calling fb-driver ops
but udlfb waits for a semaphore in many ops. This results in the BUG
"scheduling while atomic". One of those call flows is e.g.

vt_console_print() (spinlock printing_lock)
 (...)
 dlfb_ops_imageblit()
                        dlfb_handle_damage()
                                dlfb_get_urb()
     down_timeout(semaphore)
BUG: scheduling while atomic
(...)
vt_console_print() (release spinlock printing_lock)

Fix this through a workqueue for dlfb_handle_damage().

Cc: <stable <at> vger.kernel.org>
Signed-off-by: Alexander Holler <holler <at> ahsoftware.de>

And it cured the hang.

Revision history for this message
john.d@servergy.com (john-d-0) wrote :
summary: - tg3 driver hangs in iscsi booted initrd : (trusty)
+ udlfd video driver hangs in iscsi booted initrd : (trusty)
tags: added: patch
Changed in linux-ppc (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Changed in linux (Ubuntu):
status: New → Confirmed
importance: Undecided → High
tags: added: kernel-da-key saucy
Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Re: udlfd video driver hangs in iscsi booted initrd : (trusty)

Discussion of patch upstream:
https://lkml.org/lkml/2013/1/5/81

The patch was cc'd to stable, so it will make it's way into the Ubuntu kernels through the usual stable update process. It just needs to make it into Linus' tree first, then down through upstream stable.

Changed in linux (Ubuntu):
status: Confirmed → Triaged
Changed in linux-ppc (Ubuntu):
status: Confirmed → Triaged
tags: added: trusty utopic
summary: - udlfd video driver hangs in iscsi booted initrd : (trusty)
+ udlfd video driver hangs " BUG: scheduling while atomic " iscsi booted
+ initrd : (trusty)
Revision history for this message
john.d@servergy.com (john-d-0) wrote :

Servergy closed in 2015 after SEC stock fraud scandal. This bug is no longer relevant.

Changed in linux (Ubuntu):
status: Triaged → Invalid
Changed in linux-ppc (Ubuntu):
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.