Vexpress-tc2 linaro 13.05 segfaults when cluster DTS node is removed

Bug #1189457 reported by Julien Grall
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linaro Linux
Fix Released
Undecided
Tixy (Jon Medhurst)
Linaro Linux Baseline
Fix Released
Undecided
Tixy (Jon Medhurst)

Bug Description

I'm trying to boot the latest linux linaro 13.05 (ll_20130528.0) without big.LITTLE on the versatile express TC2.

I have:
  * modified board.txt to disable A7 processors and boot on CPU0
  * remove A7 cpus node in the DTS
  * remove cluster node in the DTS
  * disable uefi and directly boot from the flash

With this configuration, linux hangs with the following error:
␁1Unable to handle kernel NULL pointer dereference at virtual address 00000000
␁1pgd = c0004000
␁1[00000000] *pgd=00000000
␁0Internal error: Oops: 80000005 [#1] SMP THUMB2
␁dModules linked in:
␁dCPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0-rc2+ #231
␁dtask: c0620448 ti: c0614000 task.ti: c0614000
PC is at 0x0
LR is at sp804_timer_interrupt+0x31/0x34
pc : [<00000000>] lr : [<c001940d>] psr: 200001d3
sp : c0615e20 ip : c00193e5 fp : c0655e34
r10: c0655e20 r9 : 00000000 r8 : 00000000
r7 : c0615ea8 r6 : 00000022 r5 : 00000022 r4 : c0622580
r3 : 00000000 r2 : 00000001 r1 : c0622580 r0 : c0622580
Flags: nzCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment kernel
Control: 50c5387d Table: 8000406a DAC: 55555555

LR: 0xc001938d:
938c 9002b2c9 b1239101 f10b4698 45b30b01 9b03d9da 9a012000 f88d9c02 f88d3017
93ac f8bd2016 f3643016 f363000f b007401f 8ff0e8bd e7ec9702 f7f3b500 f248fce1
93cc f2cc13d0 681b0365 43c06858 bf004770 b500b510 fcd4f7f3 f3bf460c f24c8f4f
93ec f2cc43fc 699b0361 4798b103 13d0f248 f2cc2201 685b0365 462060da 47986823
940c bd102001 b500b570 fcbaf7f3 f3bf4606 f24c8f4f f2cc45fc 69ab0561 4798b103
942c 14d0f248 f2cc2222 68630465 2e02609a 2e03d00b 2623bf0c f3bf2622 69ab8f4f
944c 4798b103 609e6863 f3bfbd70 69ab8f4f 4798b103 686368a2 26e2601a bf00e7ed
946c b500b5f8 fc8cf7f3 14d0f248 0465f2cc 68634607 f3bf689e f3bf8f4f f24c8f4f
948c f2cc45fc 69ab0561 4798b103 601f6863 8f4ff3bf b10369ab f0464798 68630680

SP: 0xc0615da0:
5da0 c0615e18 c0658fe1 c0658fe1 c0267feb 00000001 0000000a 00000006 00000000
5dc0 200001d3 ffffffff c0615e0c 00000000 00000000 c000c975 c0622580 c0622580
5de0 00000001 00000000 c0622580 00000022 00000022 c0615ea8 00000000 00000000
5e00 c0655e20 c0655e34 c00193e5 c0615e20 c001940d 00000000 200001d3 ffffffff
5e20 c0622600 c006ef8d 00000000 c1a324e0 ef006b40 c0655c5b c066d1c0 ef006b40
5e40 00000022 c0615f80 c0615ea8 f0002000 c0614000 c0655de4 00000000 c006f0db
5e60 00000000 ef006b40 00000022 c007103f 00000022 c006e947 c0610e0c c000d5a5
5e80 f000200c 00000012 c061d738 c0008451 c002259a 40000173 ffffffff c0615edc

IP: 0xc0019365:
9364 f900f247 d31442a8 fb0942b8 d810fc00 0c0cebca 0102f1ab 73ecea8c 73eceba3
9384 45433808 b280d205 9002b2c9 b1239101 f10b4698 45b30b01 9b03d9da 9a012000
93a4 f88d9c02 f88d3017 f8bd2016 f3643016 f363000f b007401f 8ff0e8bd e7ec9702
93c4 f7f3b500 f248fce1 f2cc13d0 681b0365 43c06858 bf004770 b500b510 fcd4f7f3
93e4 f3bf460c f24c8f4f f2cc43fc 699b0361 4798b103 13d0f248 f2cc2201 685b0365
9404 462060da 47986823 bd102001 b500b570 fcbaf7f3 f3bf4606 f24c8f4f f2cc45fc
9424 69ab0561 4798b103 14d0f248 f2cc2222 68630465 2e02609a 2e03d00b 2623bf0c
9444 f3bf2622 69ab8f4f 4798b103 609e6863 f3bfbd70 69ab8f4f 4798b103 686368a2
9464 26e2601a bf00e7ed b500b5f8 fc8cf7f3 14d0f248 0465f2cc 68634607 f3bf689e

FP: 0xc0655db4:
5db4 00000000 00000000 c055e2f8 00000000 00000000 00000000 00000000 c055e314
5dd4 00000000 00000000 00000000 00000000 c055e33c 00000000 00000000 00000000
5df4 00000000 c055e32c 00000000 00000000 00000000 00000000 c055e31c 00000000
5e14 00000000 00000000 00000000 c055e34c 00000000 00000000 00000000 00000000
5e34 c055e360 00000000 00000000 00000000 00000000 c055e3fc 00000000 00000000
5e54 00000000 00000000 c055e420 00000000 00000000 00000000 00000000 c055e40c
5e74 00000000 00000000 00000000 00000000 c055e440 00000000 00000000 00000000
5e94 00000000 c055e434 00000000 00000000 00000000 00000000 c055e374 00000000

R0: 0xc0622500:
2500 00000043 00000001 00000000 00000000 00000243 00000001 00010412 00000000
2520 0000045f 00000001 00000000 00000000 00000100 00000000 00000730 00000000
2540 c06581c0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
2560 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
2580 00000000 c001946d 00000000 00000000 ffffffff 7fffffff fff53e18 000003e7
25a0 00003a97 00000000 0020c49c 0000001f 00000001 00000003 00000000 00000000
25c0 c0019411 00000000 00000000 0000000f ffffffff c0a1cd00 0000012c 00000022
25e0 c061c538 c1a34b24 c0627e10 00000000 00000000 00000000 00000000 00000000

R1: 0xc0622500:
2500 00000043 00000001 00000000 00000000 00000243 00000001 00010412 00000000
2520 0000045f 00000001 00000000 00000000 00000100 00000000 00000730 00000000
2540 c06581c0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
2560 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
2580 00000000 c001946d 00000000 00000000 ffffffff 7fffffff fff53e18 000003e7
25a0 00003a97 00000000 0020c49c 0000001f 00000001 00000003 00000000 00000000
25c0 c0019411 00000000 00000000 0000000f ffffffff c0a1cd00 0000012c 00000022
25e0 c061c538 c1a34b24 c0627e10 00000000 00000000 00000000 00000000 00000000
R4: 0xc0622500:
2500 00000043 00000001 00000000 00000000 00000243 00000001 00010412 00000000
2520 0000045f 00000001 00000000 00000000 00000100 00000000 00000730 00000000
2540 c06581c0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
2560 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
2580 00000000 c001946d 00000000 00000000 ffffffff 7fffffff fff53e18 000003e7
25a0 00003a97 00000000 0020c49c 0000001f 00000001 00000003 00000000 00000000
25c0 c0019411 00000000 00000000 0000000f ffffffff c0a1cd00 0000012c 00000022
25e0 c061c538 c1a34b24 c0627e10 00000000 00000000 00000000 00000000 00000000

R7: 0xc0615e28:
5e28 00000000 c1a324e0 ef006b40 c0655c5b c066d1c0 ef006b40 00000022 c0615f80
5e48 c0615ea8 f0002000 c0614000 c0655de4 00000000 c006f0db 00000000 ef006b40
5e68 00000022 c007103f 00000022 c006e947 c0610e0c c000d5a5 f000200c 00000012
5e88 c061d738 c0008451 c002259a 40000173 ffffffff c0615edc f0002000 c000c85b
5ea8 00000001 00000000 00000000 00000000 c06698c0 00000002 00000000 c0615f80
5ec8 f0002000 c0614000 c0655de4 00000000 c0021957 c0615ef0 c0022517 c002259a
5ee8 40000173 ffffffff c066d480 01312d00 00000000 c0616080 00000000 00000014
5f08 c060f690 00000000 c06698c0 c06160c0 ffff8ad2 c0624b70 00200000 c0614010

R10: 0xc0655da0:
5da0 c0075fad 00000000 c055e304 00000000 00000000 00000000 00000000 c055e2f8
5dc0 00000000 00000000 00000000 00000000 c055e314 00000000 00000000 00000000
5de0 00000000 c055e33c 00000000 00000000 00000000 00000000 c055e32c 00000000
5e00 00000000 00000000 00000000 c055e31c 00000000 00000000 00000000 00000000
5e20 c055e34c 00000000 00000000 00000000 00000000 c055e360 00000000 00000000
5e40 00000000 00000000 c055e3fc 00000000 00000000 00000000 00000000 c055e420
5e60 00000000 00000000 00000000 00000000 c055e40c 00000000 00000000 00000000
5e80 00000000 c055e440 00000000 00000000 00000000 00000000 c055e434 00000000
␁0Process swapper/0 (pid: 0, stack limit = 0xc0614238)
␁0Stack: (0xc0615e20 to 0xc0616000)
␁05e20: c0622600 c006ef8d 00000000 c1a324e0 ef006b40 c0655c5b c066d1c0 ef006b40
␁05e40: 00000022 c0615f80 c0615ea8 f0002000 c0614000 c0655de4 00000000 c006f0db
␁05e60: 00000000 ef006b40 00000022 c007103f 00000022 c006e947 c0610e0c c000d5a5
␁05e80: f000200c 00000012 c061d738 c0008451 c002259a 40000173 ffffffff c0615edc
␁05ea0: f0002000 c000c85b 00000001 00000000 00000000 00000000 c06698c0 00000002
␁05ec0: 00000000 c0615f80 f0002000 c0614000 c0655de4 00000000 c0021957 c0615ef0
␁05ee0: c0022517 c002259a 40000173 ffffffff c066d480 01312d00 00000000 c0616080
␁05f00: 00000000 00000014 c060f690 00000000 c06698c0 c06160c0 ffff8ad2 c0624b70
␁05f20: 00200000 c0614010 c1a34ac0 c0614010 0000001b 00000000 c0615f80 f0002000
␁05f40: 412fc0f1 00000000 00000000 c002290b c0610e0c c000d5a9 f000200c 0000000b
␁05f60: c061d738 c0008451 c05bd5e4 60000173 ffffffff c0615fb4 c0615fdc c000c85b
␁05f80: 00000000 c0655b20 00000000 00000000 c0657b00 c061c4c0 c1a2c080 ffffffff
␁05fa0: c0615fdc 412fc0f1 00000000 00000000 c03fd40b c0615fc8 c03f4bb5 c05bd5e4
␁05fc0: 60000173 ffffffff ffffffff ffffffff c05bd245 00000000 00000000 c05f0060
␁05fe0: 50c5387d c061c4f8 c05f005c c06213d4 8000406a 8000807d 00000000 00000000
[<c001940d>] (sp804_timer_interrupt+0x31/0x34) from [<c006ef8d>] (handle_irq_eve
nt_percpu+0x55/0x16c)
[<c006ef8d>] (handle_irq_event_percpu+0x55/0x16c) from [<c006f0db>] (handle_irq_
event+0x37/0x4c)
[<c006f0db>] (handle_irq_event+0x37/0x4c) from [<c007103f>] (handle_fasteoi_irq+
0x53/0xd0)
[<c007103f>] (handle_fasteoi_irq+0x53/0xd0) from [<c006e947>] (generic_handle_ir
q+0x23/0x2c)
[<c006e947>] (generic_handle_irq+0x23/0x2c) from [<c000d5a5>] (handle_IRQ+0x35/0
x70)
[<c000d5a5>] (handle_IRQ+0x35/0x70) from [<c0008451>] (gic_handle_irq+0x2d/0x54)
[<c0008451>] (gic_handle_irq+0x2d/0x54) from [<c000c85b>] (__irq_svc+0x3b/0x5c)
Exception stack(0xc0615ea8 to 0xc0615ef0)
5ea0: 00000001 00000000 00000000 00000000 c06698c0 00000002
5ec0: 00000000 c0615f80 f0002000 c0614000 c0655de4 00000000 c0021957 c0615ef0
5ee0: c0022517 c002259a 40000173 ffffffff
[<c000c85b>] (__irq_svc+0x3b/0x5c) from [<c002259a>] (__do_softirq+0x9a/0x1b4)
[<c002259a>] (__do_softirq+0x9a/0x1b4) from [<c002290b>] (irq_exit+0x73/0x98)
[<c002290b>] (irq_exit+0x73/0x98) from [<c000d5a9>] (handle_IRQ+0x39/0x70)
[<c000d5a9>] (handle_IRQ+0x39/0x70) from [<c0008451>] (gic_handle_irq+0x2d/0x54)
[<c0008451>] (gic_handle_irq+0x2d/0x54) from [<c000c85b>] (__irq_svc+0x3b/0x5c)
Exception stack(0xc0615f80 to 0xc0615fc8)
5f80: 00000000 c0655b20 00000000 00000000 c0657b00 c061c4c0 c1a2c080 ffffffff
5fa0: c0615fdc 412fc0f1 00000000 00000000 c03fd40b c0615fc8 c03f4bb5 c05bd5e4
5fc0: 60000173 ffffffff
[<c000c85b>] (__irq_svc+0x3b/0x5c) from [<c05bd5e4>] (start_kernel+0x210/0x308)
[<c05bd5e4>] (start_kernel+0x210/0x308) from [<8000807d>] (0x8000807d)
␁0Code: bad PC value
␁4---[ end trace da227214a82491b7 ]---

Revision history for this message
Julien Grall (julien-grall) wrote :

I can still reach the issue on the latest linaro tree (ll_20130821.0) This time I have no call stack from Linux.
If I remove arm,generic from the device tree I'm able to boot Linux.
The small patch in attachment can also fix the issue if cpuidle is disabled via the command line.

Revision history for this message
Tixy (Jon Medhurst) (tixy) wrote : Re: [Bug 1189457] [NEW] Vexpress-tc2 linaro 13.05 segfaults when cluster DTS node is removed

On Mon, 2013-08-26 at 16:06 +0000, Launchpad Bug Tracker wrote:
> You have been subscribed to a public bug by Fathi Boudra (fboudra):
>
> I'm trying to boot the latest linux linaro 13.05 (ll_20130528.0) without
> big.LITTLE on the versatile express TC2.
>
> I have:
> * modified board.txt to disable A7 processors and boot on CPU0
> * remove A7 cpus node in the DTS
> * remove cluster node in the DTS

I assume the pmu_a7 node and gic-cpuif@0, 1 and 2 were also removed?

Generally, removing all entries for one cluster should work, I did this
on the Linux 3.10 based LSK build last week and dtb's modified like this
are also used successfully with the workload automation tests.

Revision history for this message
Julien Grall (julien-grall) wrote :

On 08/27/2013 09:29 AM, Tixy (Jon Medhurst) wrote:
> On Mon, 2013-08-26 at 16:06 +0000, Launchpad Bug Tracker wrote:
>> You have been subscribed to a public bug by Fathi Boudra (fboudra):
>>
>> I'm trying to boot the latest linux linaro 13.05 (ll_20130528.0) without
>> big.LITTLE on the versatile express TC2.
>>
>> I have:
>> * modified board.txt to disable A7 processors and boot on CPU0
>> * remove A7 cpus node in the DTS
>> * remove cluster node in the DTS
>
> I assume the pmu_a7 node and gic-cpuif@0, 1 and 2 were also removed?

Yes, I have add the device tree diff in attachment.

> Generally, removing all entries for one cluster should work, I did this
> on the Linux 3.10 based LSK build last week and dtb's modified like this
> are also used successfully with the workload automation tests.

I just tried again the latest Linaro tree (ll_20130821.0) and the
platform still hangs.

--
Julien Grall

Revision history for this message
Tixy (Jon Medhurst) (tixy) wrote :

On Tue, 2013-08-27 at 12:44 +0000, Julien Grall wrote:
> Yes, I have add the device tree diff in attachment.
>
> > Generally, removing all entries for one cluster should work, I did this
> > on the Linux 3.10 based LSK build last week and dtb's modified like this
> > are also used successfully with the workload automation tests.
>
> I just tried again the latest Linaro tree (ll_20130821.0) and the
> platform still hangs.

I've just tried the same and it works for me, perhaps your bootloader is
starting the kernel on the A7 cluster and not the A15?

Tip, to check which CPU the system is booting on, look for a line in
kernel boot for 'CPU0:' like

   CPU0: thread -1, cpu 0, socket 0, mpidr 80000000

or

   CPU0: thread -1, cpu 0, socket 1, mpidr 80000100

On TC2, the mpidr value is 80000000 for the first A15 cpu and 80000100
for the first A7.

If you are using bootmon, or UEFI started from bootmon, then the boot
CPU is changed by editing the file SITE1/HBI0249A/board.txt on the
vexpress internal micro-SD card, and clearing bit 28 of the SCC: 0x700
value, e.g. to boot on A15 I used the line:

   SCC: 0x700 0x0032F003

Our standard config in the files on the Linaro release pages has it set
for A7 booting with:

   SCC: 0x700 0x1032F003

I this isn't the problem, can you provide a log of the serial output
during a failed boot, including all bootmonitor and bootloader output
from before the kernel is started.

Thanks

--
Tixy

Revision history for this message
Tixy (Jon Medhurst) (tixy) wrote :

On Tue, 2013-08-27 at 14:51 +0100, Jon Medhurst (Tixy) wrote:
> Tip, to check which CPU the system is booting on, look for a line in
> kernel boot for 'CPU0:' like
>
> CPU0: thread -1, cpu 0, socket 0, mpidr 80000000

A better line too look out for is the very first line of kernel
output :-)

Booting Linux on physical CPU 0x100

'physical CPU' should be the lower 24 bits of the mpidr register, 0x100
for the first A7, 0x0 for the first A15.

Revision history for this message
Julien Grall (julien-grall) wrote :

On 08/27/2013 02:51 PM, Tixy (Jon Medhurst) wrote:
> On Tue, 2013-08-27 at 12:44 +0000, Julien Grall wrote:
>> Yes, I have add the device tree diff in attachment.
>>
>>> Generally, removing all entries for one cluster should work, I did this
>>> on the Linux 3.10 based LSK build last week and dtb's modified like this
>>> are also used successfully with the workload automation tests.
>>
>> I just tried again the latest Linaro tree (ll_20130821.0) and the
>> platform still hangs.
>
> I've just tried the same and it works for me, perhaps your bootloader is
> starting the kernel on the A7 cluster and not the A15?
>
> Tip, to check which CPU the system is booting on, look for a line in
> kernel boot for 'CPU0:' like
>
> CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
>
> or
>
> CPU0: thread -1, cpu 0, socket 1, mpidr 80000100
>
> On TC2, the mpidr value is 80000000 for the first A15 cpu and 80000100
> for the first A7.
>
> If you are using bootmon, or UEFI started from bootmon, then the boot
> CPU is changed by editing the file SITE1/HBI0249A/board.txt on the
> vexpress internal micro-SD card, and clearing bit 28 of the SCC: 0x700
> value, e.g. to boot on A15 I used the line:
>
> SCC: 0x700 0x0032F003
>
> Our standard config in the files on the Linaro release pages has it set
> for A7 booting with:
>
> SCC: 0x700 0x1032F003
>
> I this isn't the problem, can you provide a log of the serial output
> during a failed boot, including all bootmonitor and bootloader output
> from before the kernel is started.

Unfortunately, I still can't boot Linux on the versatile express.
board.txt is correctly modified
(http://releases.linaro.org/13.07/ubuntu/vexpress/).

You can find in attachment the serial output.

Cheers,

--
Julien Grall

Revision history for this message
Tixy (Jon Medhurst) (tixy) wrote :

On Wed, 2013-08-28 at 15:05 +0000, Julien Grall wrote:
> You can find in attachment the serial output.

From that I see you have boot monitor version V5.1.7, so it looks like
the required firmware installed on the TC2. I'm pretty sure you need
need V5.1.9 which should be in "ARM’s CPU Migration patch" as linked
from the Firmware Update tab of the release pages [1] and that also
should have an updated daughter board bios (dbb_v110.ebf). Also, make
sure you have the correct SCC register settings (see the 'Download
additional Linaro firmware' section of the same release note tab).

Basically, without all of these the boot and power management protocols
will be different between the kernel and the firmware. I suspect that
this is the problem because your logs I see:

 CPU1: failed to come online

so the second A15 didn't boot, and the board then looks like it rests
shortly after

 CPUidle for CPU0 registered

so about the time when the system would start trying to power down idle
cores.

[1] http://releases.linaro.org/13.07/android/vexpress

--
Tixy

Revision history for this message
Julien Grall (julien-grall) wrote :

On 08/28/2013 05:27 PM, Tixy (Jon Medhurst) wrote:
> On Wed, 2013-08-28 at 15:05 +0000, Julien Grall wrote:
>> You can find in attachment the serial output.
>
>>From that I see you have boot monitor version V5.1.7, so it looks like
> the required firmware installed on the TC2. I'm pretty sure you need
> need V5.1.9 which should be in "ARM’s CPU Migration patch" as linked
> from the Firmware Update tab of the release pages [1] and that also
> should have an updated daughter board bios (dbb_v110.ebf). Also, make
> sure you have the correct SCC register settings (see the 'Download
> additional Linaro firmware' section of the same release note tab).

Thank you for the hint, it works now.

Actually, I also fixed another issue with Xen. Thanks to the README in
"ARMs CPU Migration patch".

> Basically, without all of these the boot and power management protocols
> will be different between the kernel and the firmware. I suspect that
> this is the problem because your logs I see:
>
> CPU1: failed to come online
>
> so the second A15 didn't boot, and the board then looks like it rests
> shortly after
>
> CPUidle for CPU0 registered
>
> so about the time when the system would start trying to power down idle
> cores.
>
> [1] http://releases.linaro.org/13.07/android/vexpress
>

Cheers,

--
Julien Grall

Fathi Boudra (fboudra)
Changed in linux-linaro:
assignee: nobody → Tixy (Jon Medhurst) (tixy)
Changed in linaro-linux-baseline:
assignee: nobody → Tixy (Jon Medhurst) (tixy)
Changed in linux-linaro:
milestone: none → 13.08
Changed in linaro-linux-baseline:
milestone: none → 13.08
status: New → Fix Released
Changed in linux-linaro:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.