here is the first weird thing, reboot_code_buffer is NULL now, advance some more till line 122.
at this point the kernel relocation code (arch/arm/kernel/relocate_kernel.S::relocate_new_kernel()) it has already been copied to reboot_code_buffer (see the memcpy at line 105), and the caches have been flushed, so i expect to find that routine there - and dumping the memory confirms it:
(gdb) x/20 0xee4af000 (< this is the relocate_code_buffer value AKA reboot_code_buffer as the MMU sees it before turning 0 - see above)
0xee4af000: 0xe59f0074 0xe59f106c 0xe3500000 0x0a000014
0xee4af010: 0xe4903004 0xe3130001 0x0a000001 0xe3c34001
0xee4af020: 0xeafffffa 0xe3130002 0x0a000001 0xe3c30002
0xee4af030: 0xeafffff6 0xe3130004 0x0a000000 0xea000008
0xee4af040: 0xe3130008 0x0afffff1 0xe3c33008 0xe3a06b01
dump the physical memory via openocd to double check:
this is NOT the relocation routine that was copied with memcpy(), and that is NOT the memory location we should jump!
let's stepi a bit more:
(gdb) stepi
0xc0051c60 126 cpu_reset(reboot_code_buffer_phys);
(gdb) stepi
cpu_v7_reset () at /home/flag/canonical/ubuntu-natty/arch/arm/mm/proc-v7.S:64
64 mov pc, r0
(gdb) p/x $r0
$8 = 0x815d8000
(gdb) stepi
0x815d8000 in ?? ()
(gdb) c
Continuing.
and here is the panic:
[ 518.057678] omap_device: omap-aess-audio.-1: new worst case activate latency 0: 152587
[ 518.067565] omap_device: omap-aess-audio.-1: new worst case deactivate latency 0: 61035
[ 518.089172] Internal error: Oops - undefined instruction: 0 [#1] PREEMPT
[ 518.096221] last sysfs file: /sys/kernel/kexec_loaded
[ 518.101501] Modules linked in: wl12xx wl12xx_sdio btsdio
[ 518.107086] CPU: 0 Tainted: G W (2.6.38-1208-omap4 #11)
[ 518.113739] PC is at 0x815dc040
[ 518.117065] LR is at machine_kexec+0x15c/0x198
[ 518.121704] pc : [<815dc040>] lr : [<c0051c64>] psr: 600000d3
[ 518.121704] sp : c170de50 ip : 000098c9 fp : 000098c9
[ 518.133758] r10: 00000105 r9 : 00000000 r8 : 815d8000
[ 518.139221] r7 : c170c000 r6 : 00008000 r5 : 815d8000 r4 : c0051e24
[ 518.146057] r3 : 00000002 r2 : 00000005 r1 : 00008000 r0 : 00000000
[ 518.152893] Flags: nZCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment user
[ 518.160552] Control: 10c5387d Table: ae4dc059 DAC: 00000015
[ 518.166564] Process kexec (pid: 542, stack limit = 0xc170c2f0)
[ 518.172668] Stack: (0xc170de50 to 0xc170e000)
[ 518.177246] de40: ee4af000 ae4af000 00000088 00000000
[ 518.185821] de60: c08a64ac c178cc00 fee1dead 00000000 c004c528 c00c091c 45584543 45584543
[ 518.194396] de80: c0835ed0 c009ec90 00000002 c04a8908 ef9cb800 00000002 00000000 ffffffef
[ 518.202972] dea0: 00000000 c05d45b0 ef9cb800 00001001 00000000 00000000 c160b000 c00abc58
[ 518.211517] dec0: 00000000 00000002 ef9cb800 c04a294c ef9cb800 00001003 c170ded8 c170ded8
[ 518.220092] dee0: 00000002 00000000 ee584f40 c04f50bc ee48a220 ee48a220 ef4adc00 ef6eef00
[ 518.228668] df00: ffffffff c0164480 ef4adc00 00000000 ef6eef00 00000008 ef4adc20 00000000
[ 518.237243] df20: ef6eef00 00000000 ef6eef00 c0143a2c 00000014 ef6eef00 00000000 c0143c58
[ 518.245819] df40: c1608e00 ef4adc20 ef6eef00 c0132600 00000000 00000000 c1608e14 c1608e00
[ 518.254394] df60: 00000000 c160ccc0 000241b8 c014c224 c1608e00 00000000 c160ccc0 c012eebc
[ 518.262939] df80: 00000003 00000001 c170c000 cdcefa50 00000002 400e95e7 00000000 40101fd0
[ 518.271514] dfa0: 00000058 c004c340 400e95e7 00000000 fee1dead 28121969 45584543 00000000
[ 518.280090] dfc0: 400e95e7 00000000 40101fd0 00000058 00000000 00000001 000241c8 00000105
[ 518.288665] dfe0: be967728 be967718 0000ab4b 401a47b0 80000010 fee1dead 00000000 00000000
[ 518.297271] [<c0051c64>] (machine_kexec+0x15c/0x198) from [<c00c091c>] (kernel_kexec+0xd8/0x104)
[ 518.306518] [<c00c091c>] (kernel_kexec+0xd8/0x104) from [<c009ec90>] (sys_reboot+0x190/0x214)
[ 518.315460] [<c009ec90>] (sys_reboot+0x190/0x214) from [<c004c340>] (ret_fast_syscall+0x0/0x30)
[ 518.324584] Code: 00000000 00000000 c15dc038 c15dc038 (ee4f1160)
[ 518.331207] usb 1-1.1: ifconfig timed out on ep0out len=0/4
[ 518.337066] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000014
init: rsyslog main process (347) killed by SEGV signal
init: rsyslog main process ended, respawning
[ 518.373504] smsc95xx 1-1.1:1.0: eth0: Failed to write HW_CFG_LRST_ bit in HW_CFG register, ret = -110
[ 518.549499] ---[ end trace da227214a82491ba ]---
to me, it looks like a memory (stack?) corruption in the final part of machine_kexec(), anyone with mmu/openocd/gdb knowledge is welcome at this point.
looks like a corruption is happening in machine_ kexec.c: machine_ kexec() in the final part - let's see a debugging session:
reboot the panda board and connect via openocd+gdb:
(gdb) l machine_kexec,+50 code_buffer_ phys; code_buffer; code_buffer_ phys = pfn(image- >control_ code_page) << PAGE_SHIFT; image-> control_ code_page) ; code_buffer* / on_page = page_list; ZIMAGE_ OFFSET + KEXEC_ARM_ ATAGS_OFFSET; reboot_ code_buffer, new_kernel, relocate_ new_kernel_ size); address: 0x%08lx reboot_code_buffer: %p reboot_ code_buffer_ phys: 0x%08lx relocate_ new_kernel_ size: %u\n", address, reboot_code_buffer, reboot_ code_buffer_ phys, relocate_ new_kernel_ size); range(( unsigned long) reboot_code_buffer, PAGE_SIZE) ; disable( ); disable( ); for_reboot( 0); /* mode is not used, so just pass 0*/ reboot_ code_buffer_ phys); canonical/ ubuntu- natty/arch/ arm/kernel/ machine_ kexec.c, line 111.
84 {
85 unsigned long page_list;
86 unsigned long reboot_
87 void *reboot_
88
89 printk("%s::%d\n", __FUNCTION__, __LINE__);
90 page_list = image->head & PAGE_MASK;
91
92 /* we need both effective and real address here */
93 reboot_
94 page_to_
95 reboot_code_buffer = page_address(
96
97 /* Prepare parameters for reboot_
98 kexec_start_address = image->start;
99 kexec_indirecti
100 kexec_mach_type = machine_arch_type;
101 kexec_boot_atags = image->start - KEXEC_ARM_
102 printk("%s::%d\n", __FUNCTION__, __LINE__);
103
104 /* copy our kernel relocation code to the control code page */
105 memcpy(
106 relocate_
107
108 printk("%s::%d kexec_start_
109 __FUNCTION__, __LINE__, kexec_start_
110
111 flush_icache_
112 (unsigned long) reboot_code_buffer + KEXEC_CONTROL_
113 printk(KERN_INFO "Bye!\n");
114
115 if (kexec_reinit)
116 kexec_reinit();
117 local_irq_
118 local_fiq_
119 setup_mm_
120 flush_cache_all();
121 outer_flush_all();
122 outer_disable();
123 cpu_proc_fin();
124 outer_inv_all();
125 flush_cache_all();
126 cpu_reset(
127 }
(gdb) break machine_kexec.c:111
Breakpoint 1 at 0xc0051bc4: file /home/flag/
(gdb) c
Continuing.
via ssh i execute kexec:
flag@omap:~$ uname -a 2.6.38- 1208-omap4 --initrd= /boot/initrd. img-2.6. 38-1208- omap4 --command- line="` cat /proc/cmdline`"
Linux omap 2.6.38-1208-omap4 #11 PREEMPT Thu Apr 28 10:44:16 CEST 2011 armv7l GNU/Linux
flag@omap:~$ sudo kexec -l /boot/vmlinuz-
flag@omap:~$ sudo kexec -e
on the serial console:
[ 72.652587] sys_reboot::380 address: 0x80008000 reboot_code_buffer: ee4af000 reboot_ code_buffer_ phys: 0xae4af000 relocate_ new_kernel_ size: 136
[ 72.655517] kernel_kexec::1504
[ 72.986114] omapdss DISPC error: timeout waiting for EVSYNC
[ 73.087677] omapdss DISPC error: timeout waiting for EVSYNC
[ 73.093566] Starting new kernel
[ 73.097625] machine_kexec::89
[ 73.100738] machine_kexec::102
[ 73.104339] machine_kexec::109 kexec_start_
back to gdb:
Breakpoint 1, machine_kexec (image=<value optimized out>) at /home/flag/ canonical/ ubuntu- natty/arch/ arm/kernel/ machine_ kexec.c: 111 range(( unsigned long) reboot_code_buffer, code_buffer_ phys
111 flush_icache_
(gdb) p/x reboot_code_buffer
$3 = 0xee4af000
(gdb) p/x reboot_
$4 = 0xae4af000
everything looks good, advance till line 113:
(gdb) p/x reboot_ code_buffer_ phys
$5 = 0xae4af000
(gdb) p/x reboot_code_buffer
$6 = 0x0
here is the first weird thing, reboot_code_buffer is NULL now, advance some more till line 122.
at this point the kernel relocation code (arch/arm/ kernel/ relocate_ kernel. S::relocate_ new_kernel( )) it has already been copied to reboot_code_buffer (see the memcpy at line 105), and the caches have been flushed, so i expect to find that routine there - and dumping the memory confirms it:
(gdb) x/20 relocate_new_kernel new_kernel> : 0xe59f0074 0xe59f106c 0xe3500000 0x0a000014 new_kernel+ 16>: 0xe4903004 0xe3130001 0x0a000001 0xe3c34001 new_kernel+ 32>: 0xeafffffa 0xe3130002 0x0a000001 0xe3c30002 new_kernel+ 48>: 0xeafffff6 0xe3130004 0x0a000000 0xea000008 new_kernel+ 64>: 0xe3130008 0x0afffff1 0xe3c33008 0xe3a06b01
0xc0051dac <relocate_
0xc0051dbc <relocate_
0xc0051dcc <relocate_
0xc0051ddc <relocate_
0xc0051dec <relocate_
(gdb) x/20 0xee4af000 (< this is the relocate_ code_buffer value AKA reboot_code_buffer as the MMU sees it before turning 0 - see above)
0xee4af000: 0xe59f0074 0xe59f106c 0xe3500000 0x0a000014
0xee4af010: 0xe4903004 0xe3130001 0x0a000001 0xe3c34001
0xee4af020: 0xeafffffa 0xe3130002 0x0a000001 0xe3c30002
0xee4af030: 0xeafffff6 0xe3130004 0x0a000000 0xea000008
0xee4af040: 0xe3130008 0x0afffff1 0xe3c33008 0xe3a06b01
dump the physical memory via openocd to double check:
(gdb) mon mdw phys 0xae4af000 20
0xae4af000: e59f0074 e59f106c e3500000 0a000014 e4903004 e3130001 0a000001 e3c34001
0xae4af020: eafffffa e3130002 0a000001 e3c30002 eafffff6 e3130004 0a000000 ea000008
0xae4af040: e3130008 0afffff1 e3c33008 e3a06b01
fine, everything is setup, let's step till line 126, the last line:
(gdb) p/x reboot_ code_buffer_ phys
$7 = 0x815d8000
uh? what? this is wrong, why has reboot_ code_buffer_ phys changed?
as memory dump shows:
(gdb) mon mdw phys 0x815d8000 20
0x815d8000: 80001001 bef27008 80008001 bef35008 bf000008 bef36008 bef37008 befd2008
0x815d8020: befd3008 bf002008 bf003008 bef24008 bef25008 befd4008 befd5008 befd6008
0x815d8040: befd7008 bef20008 bef21008 bef22008
this is NOT the relocation routine that was copied with memcpy(), and that is NOT the memory location we should jump!
let's stepi a bit more:
(gdb) stepi reboot_ code_buffer_ phys); canonical/ ubuntu- natty/arch/ arm/mm/ proc-v7. S:64
0xc0051c60 126 cpu_reset(
(gdb) stepi
cpu_v7_reset () at /home/flag/
64 mov pc, r0
(gdb) p/x $r0
$8 = 0x815d8000
(gdb) stepi
0x815d8000 in ?? ()
(gdb) c
Continuing.
and here is the panic:
[ 518.057678] omap_device: omap-aess-audio.-1: new worst case activate latency 0: 152587 kexec_loaded kexec+0x15c/ 0x198 kexec+0x15c/ 0x198) from [<c00c091c>] (kernel_ kexec+0xd8/ 0x104) kexec+0xd8/ 0x104) from [<c009ec90>] (sys_reboot+ 0x190/0x214) 0x190/0x214) from [<c004c340>] (ret_fast_ syscall+ 0x0/0x30)
[ 518.067565] omap_device: omap-aess-audio.-1: new worst case deactivate latency 0: 61035
[ 518.089172] Internal error: Oops - undefined instruction: 0 [#1] PREEMPT
[ 518.096221] last sysfs file: /sys/kernel/
[ 518.101501] Modules linked in: wl12xx wl12xx_sdio btsdio
[ 518.107086] CPU: 0 Tainted: G W (2.6.38-1208-omap4 #11)
[ 518.113739] PC is at 0x815dc040
[ 518.117065] LR is at machine_
[ 518.121704] pc : [<815dc040>] lr : [<c0051c64>] psr: 600000d3
[ 518.121704] sp : c170de50 ip : 000098c9 fp : 000098c9
[ 518.133758] r10: 00000105 r9 : 00000000 r8 : 815d8000
[ 518.139221] r7 : c170c000 r6 : 00008000 r5 : 815d8000 r4 : c0051e24
[ 518.146057] r3 : 00000002 r2 : 00000005 r1 : 00008000 r0 : 00000000
[ 518.152893] Flags: nZCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment user
[ 518.160552] Control: 10c5387d Table: ae4dc059 DAC: 00000015
[ 518.166564] Process kexec (pid: 542, stack limit = 0xc170c2f0)
[ 518.172668] Stack: (0xc170de50 to 0xc170e000)
[ 518.177246] de40: ee4af000 ae4af000 00000088 00000000
[ 518.185821] de60: c08a64ac c178cc00 fee1dead 00000000 c004c528 c00c091c 45584543 45584543
[ 518.194396] de80: c0835ed0 c009ec90 00000002 c04a8908 ef9cb800 00000002 00000000 ffffffef
[ 518.202972] dea0: 00000000 c05d45b0 ef9cb800 00001001 00000000 00000000 c160b000 c00abc58
[ 518.211517] dec0: 00000000 00000002 ef9cb800 c04a294c ef9cb800 00001003 c170ded8 c170ded8
[ 518.220092] dee0: 00000002 00000000 ee584f40 c04f50bc ee48a220 ee48a220 ef4adc00 ef6eef00
[ 518.228668] df00: ffffffff c0164480 ef4adc00 00000000 ef6eef00 00000008 ef4adc20 00000000
[ 518.237243] df20: ef6eef00 00000000 ef6eef00 c0143a2c 00000014 ef6eef00 00000000 c0143c58
[ 518.245819] df40: c1608e00 ef4adc20 ef6eef00 c0132600 00000000 00000000 c1608e14 c1608e00
[ 518.254394] df60: 00000000 c160ccc0 000241b8 c014c224 c1608e00 00000000 c160ccc0 c012eebc
[ 518.262939] df80: 00000003 00000001 c170c000 cdcefa50 00000002 400e95e7 00000000 40101fd0
[ 518.271514] dfa0: 00000058 c004c340 400e95e7 00000000 fee1dead 28121969 45584543 00000000
[ 518.280090] dfc0: 400e95e7 00000000 40101fd0 00000058 00000000 00000001 000241c8 00000105
[ 518.288665] dfe0: be967728 be967718 0000ab4b 401a47b0 80000010 fee1dead 00000000 00000000
[ 518.297271] [<c0051c64>] (machine_
[ 518.306518] [<c00c091c>] (kernel_
[ 518.315460] [<c009ec90>] (sys_reboot+
[ 518.324584] Code: 00000000 00000000 c15dc038 c15dc038 (ee4f1160)
[ 518.331207] usb 1-1.1: ifconfig timed out on ep0out len=0/4
[ 518.337066] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000014
init: rsyslog main process (347) killed by SEGV signal
init: rsyslog main process ended, respawning
[ 518.373504] smsc95xx 1-1.1:1.0: eth0: Failed to write HW_CFG_LRST_ bit in HW_CFG register, ret = -110
[ 518.549499] ---[ end trace da227214a82491ba ]---
to me, it looks like a memory (stack?) corruption in the final part of machine_kexec(), anyone with mmu/openocd/gdb knowledge is welcome at this point.