Kernel 3.10 crashes randomly after upgrade of gcc from 4.7 to 4.8
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Linaro GCC |
Fix Released
|
Undecided
|
Unassigned | ||
Linaro Toolchain Binaries |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
We found that the kernel crashes at various places after we have upgraded the gcc from 4.7 to 4.8.
In particular, when we switched gcc:
From - gcc-linaro-
To - gcc-linaro-
One of the places that the kernel crashes very often is when an MMC request is initiated. Therefore, I have analyzed the crash logs, traced the MMC stack, and created some debug build to conclude that the scatter-gather list length (nsegs) returned from blk_rq_map_sg() is not valid. Furthermore, the signature of the crashes suggested that the invalid length resembles the value of the Current Program Status Register (CPSR).
Below is the snippet of the crash log and I have attached the complete log for your reference:
=======
[ 83.756520] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[ 83.761815] Modules linked in:
[ 83.764884] CPU: 0 PID: 977 Comm: mmcqd/1 Tainted: G W 3.10.16+ #3
[ 83.771893] task: ddd75c40 ti: ddfa8000 task.ti: ddfa8000
[ 83.777271] PC is at mmc_queue_
[ 83.781881] LR is at mmc_queue_
[ 83.786569] pc : [<c04d381c>] lr : [<c04d383c>] psr: a0000013
[ 83.786569] sp : ddfa9dd0 ip : ddfa9dd0 fp : ddfa9df4
[ 83.797972] r10: ddce9c00 r9 : 00000000 r8 : ddce9c24
[ 83.803170] r7 : 00000002 r6 : 00002000 r5 : 00000002 r4 : a0000013
[ 83.809661] r3 : c13c7222 r2 : c13c7222 r1 : dbd7d418 r0 : 00000000
[ 83.816154] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
[ 83.823422] Control: 10c5387d Table: 5e77006a DAC: 00000015
=======
I am aware that there was a known issue in kernel 3.8 related to memset() for the gcc upgrade from 4.7 to 4.8 (https:/
Nevertheless, I noticed that the fix that was done for memset() is only applicable to the assembly file memset.S, but not memzero.S. If I look into the disassembly code from blk_rq_map_sg(), I can see that memzero() is called. But I am not sure if this is the root cause of the crashes I have observed.
Lastly, I would like to relate the issue I reported here to the one reported in here.
https:/
The crash signature looks somewhat similar and it is still an unresolved issue.
=======
List of supporting documents for Linaro to analyze the crash
1. .config
- this is the Kernel config used to compile the Kernel. Kernel version is based on v3.10.
2. queue.c
- additional debug code are added to catch the crash condition earlier on the MMC stack.
- original file located in: /drivers/mmc/card/
3. vmlinux_
- vmlinx, its listing file and map-file
4. kernel_
- 2 instances of the kernel crashes.
5. Makefile
- shows the compiler configuration option
Note: ARCH=arm, CROSS_COMPILE=
6. mmc_queue_
- annotated disassembly code which has the extra debug code added in queue.c
information type: | Public → Private Security |
information type: | Private Security → Private |
information type: | Private → Public |
Changed in linaro-toolchain-binaries: | |
status: | New → Confirmed |
Changed in linaro-toolchain-binaries: | |
milestone: | none → 2013.12 |
status: | Confirmed → Fix Released |
Changed in gcc-linaro: | |
status: | New → Fix Released |
I should mention that 'nsegs' is stored on the stack and it looks corrupted by the time it returns from blk_rq_map_sg().
We have observed other crashes where the value stored on the stack is corrupted and the corrupted value resemble the value stored in PSR. The following crash logs are obtained from *another linux image* and I am providing this additional info to illustrate my point here.
======= ======= ======= ======= ======= ===
Klog-3.TXT
Crash #1
[ 346.623251] Unable to handle kernel paging request at virtual address a000002f cmp+0x18/ 0x4c
[ 346.631225] pgd = d4dc8000
[ 346.635486] [a000002f] *pgd=00000000
[ 346.640859] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[ 346.646199] Modules linked in: bcmdhd
[ 346.649964] CPU: 1 PID: 2339 Comm: Binder_3 Tainted: G W 3.10.16+ #1
[ 346.657184] task: d1c15200 ti: d1c2e000 task.ti: d1c2e000
[ 346.662592] PC is at plug_rq_
[ 346.666699] LR is at merge+0x40/0x80
[ 346.670286] pc : [<c02a7378>] lr : [<c02d6a44>] psr: a0000013
[ 346.670286] sp : d1c2fc90 ip : d1c2fca0 fp : d1c2fc9c
[ 346.681730] r10: d1c2fd38 r9 : 00000002 r8 : c02a7360
[ 346.686971] r7 : 00000000 r6 : d1c2fca0 r5 : c22853c0 r4 : a0000013
[ 346.693511] r3 : a0000013 r2 : a0000013 r1 : c22853c0 r0 : ddd88000
[ 346.700048] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 346.707183] Control: 10c5387d Table: 64dc806a DAC: 00000015
===========
Klog-6.TXT
Crash #1
[ 8904.709374] Unable to handle kernel paging request at virtual address 6000012f cmp+0x14/ 0x4c 0x194/0x204
[ 8904.719140] pgd = c82c8000
[ 8904.721923] [6000012f] *pgd=00000000
[ 8904.725627] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[ 8904.731010] Modules linked in:
[ 8904.734229] CPU: 2 PID: 6181 Comm: ing.mp3.android Tainted: G W 3.10.16+ #1
[ 8904.742103] task: d034dc40 ti: c829a000 task.ti: c829a000
[ 8904.747525] PC is at plug_rq_
[ 8904.751700] LR is at list_sort+
[ 8904.755880] pc : [<c02a7374>] lr : [<c02d6c18>] psr: 20000113
[ 8904.755880] sp : c829bba8 ip : c829bbb8 fp : c829bbb4
[ 8904.767394] r10: c829bc20 r9 : 00000001 r8 : d2f074b0
[ 8904.772696] r7 : 60000113 r6 : c02a7360 r5 : 00000000 r4 : c829bc50
[ 8904.779267] r3 : 00000000 r2 : d2f074b0 r1 : 60000113 r0 : 00000000
[ 8904.785847] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 8904.793035] Control: 10c5387d Table: 582c806a DAC: 00000015
Summary:
1. The crash from Klog-3.txt and Klog-6.txt revealed some interesting info because the crashes occurs in 2 different instruction that is next to each other. In one case, r2 is corrupted and the other r1 is corrupted. They should be close to each other in the memory. Both r1 and r2 are input passing into plug_rq_cmp()