KEXEC support broken

Bug #517841 reported by Eric Miao on 2010-02-05
30
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Adana
High
Unassigned
linux (Ubuntu)
Medium
Bryan Wu
Nominated for Maverick by Bryan Wu
Lucid
Medium
Bryan Wu
linux-ti-omap (Ubuntu)
Medium
Bryan Wu
Nominated for Maverick by Bryan Wu
Lucid
Medium
Bryan Wu

Bug Description

Currently, KEXEC on Dove failed to reboot into a new kernel.

This failed even with the patches to support kexec on v7 as below:

http://lists.infradead.org/pipermail/linux-arm-kernel/2009-December/006431.html

Some one, in the thread above, did report success case, so we might take a look.

Eric Miao (eric.y.miao) wrote :

With the following four patches from the reference above, kexec is able to boot into the loaded zImage, and stalls right after "Uncompressing Linux ........................... done".

Eric Miao (eric.y.miao) wrote :
Eric Miao (eric.y.miao) wrote :

Single stepping with JTAG showed that some instructions were incorrect, and these instructions started from 32-byte boundary, suspected to be L2 cache issue. After disabling L2 cache, kexec was able to boot into the zImage OK. Using armv7 version of cache functions in arch/arm/boot/compressed/head.S was able to solve this issue as well, but in turn exhibited slow decompressing speed. And Marvell kindly provided the following two patches to solve this issue completely.

Eric Miao (eric.y.miao) wrote :

NCommander reported initramfs not being loaded correctly, and was able to reproduce that with the following suspicious kernel messages:

[ 0.298097] Trying to unpack rootfs image as initramfs...
[ 0.298372] rootfs image is not initramfs (no cpio magic); looks like an initrd

And

[ 3.323620] RAMDISK: Couldn't find valid RAM disk image starting at 0.

Looks like the initrd image is not correctly recognized, and thus fall back into using legacy RAMDISK way. Turning L2 cache doesn't solve this issue.

Eric Miao (eric.y.miao) wrote :

Root caused the initramfs issue, actually caused by zImage being
decompressed and the initramfs data area is overwritten. After modifying
kexec-tools to load initramfs to some place far away from where the
decompressing of zImage and relocation of kernel happens, it now boots
OK. A bad co-work between kexec and zImage!

The layout is basically as follows (offset is based against physical DRAM
starting address):

0x0000_0000 +------------+
           | Not Used |
0x0000_1000 +------------+
           | ATAGS |
0x0000_8000 +------------+
           | |
           | |
           | zImage |
           | |
                ...
                ...
           | |
0x0080_0000 +------------+
           | |
           | |
           | initrd |
           | |
           | |

0x0000_8000 - 0x0080_0000 is reserved for zImage, decompressing will
possibly overwrite the initrd area if zImage is large enough (which is true
in our case). My trial of moving initrd from 0x0080_0000 to 0x0800_0000
solved this issue.

So there are actually two workarounds:

1. a modified kexec, which put initramfs data far behind the zImage area
2. loading an executable vmlinux directly instead of using zImage, thus
no decompressing involved (saving some time to decompress), and no
overwriting will occur.

Option 2) may require some time to work and verify so a quick and dirty
solution would be 1).

And this apparently affects imx51 as well.

Eric Miao (eric.y.miao) wrote :

There is actually a third solution tested OK and which definitely looks the reasonable:

3. the vmlinux binary, which is an 'objcopy' stripped version of the vmlinux ELF version, and can be loaded with the un-modified kexec without problem, since this image has already been decompressed and is placed by kexec at the right place, 0x0000_8000 where it can be executed directly from.

Bryan Wu (cooloney) wrote :

With Eric and Saeed's help, I made kexec rebooting system work on imx51 babbage board. Here is my kernel tree for testing:
http://kernel.ubuntu.com/git?p=roc/ubuntu-lucid.git;a=shortlog;h=refs/heads/kexec

This patch is from Tony (OMAP kernel maintainer), but it is not in upstream
fcfa30b arm: Fix init_atags_procfs() to check tag->hdr.size

These 2 patches from Saeed, it works for both mvl-dove and fsl-imx51. I will ask him to post for upstream
2f1f269 arm: invalidate TLBs when enabling mmu
0c860f2 arm: disable L2 cache in the v7 finish function

These 5 patches are in mainline upstream .33 kernel now, it is good for back porting.
4896ee6 ARM: 5888/1: arm: Update comments in cacheflush.h and remove unnecessary V6 and V7 comments
f8dc814 ARM: 5886/1: arm: Fix cpu_proc_fin() for proc-v7.S and make kexec work
e2f3613 ARM: 5885/1: arm: Flush TLB entries in setup_mm_for_reboot()
4b35822 ARM: 5884/1: arm: Fix DCC console for v7
78661da ARM: 5882/1: ARM: Fix uncompress code compile for different defines of flush(void)

I still need Eric's trick to patch kexec-tools and reboot my kernel with kexec on fsl-imx51 babbage.

Thanks,
-Bryan

Bryan Wu (cooloney) wrote :

I think patch 'fcfa30b arm: Fix init_atags_procfs() to check tag->hdr.size' is not necessary here. I tested the kernel without this patch, kexec rebooting works fine.

Please find my git branch here:
http://kernel.ubuntu.com/git?p=roc/ubuntu-lucid.git;a=shortlog;h=refs/heads/kexec

and the kernel package for imx51 is here:
http://people.canonical.com/~roc/kernel/kexec/

Loïc Minier (lool) wrote :

It sounds to me as if kexec should either be rejecting vmlinuz or should be doing the objcopy dance itself or should be placing the vmlinuz at the right place as to not clobber the initramfs -- the kernel is going to end up uncompressed anyway, so kexec should be careful about what it accepts or make sure it works.

Loïc Minier (lool) wrote :

This is fixed in linux-mvl-dove and linux-fsl-imx51; should have been fixed in linux (and propagated) instead of fixing it in these two subtrees, but anyway it's done now. :-)

We should get this fixed in linux and linux-ti-omap to get working kexec support in versatile and omap kernels.

visibility: private → public
summary: - [dove] no KEXEC support
+ KEXEC support broken
Changed in linux (Ubuntu Lucid):
status: New → Confirmed
milestone: none → lucid-updates
Changed in linux-ti-omap (Ubuntu Lucid):
milestone: none → lucid-updates
status: New → Confirmed
Alexander Sack (asac) wrote :

assigning amitk for the lucid/SRU task of -omap.

Changed in linux-ti-omap (Ubuntu Lucid):
assignee: nobody → Amit Kucheria (amitk)
Loïc Minier (lool) wrote :

I think this is fixed in Adana; closing there.

Changed in adana:
status: Confirmed → Fix Released
Loïc Minier (lool) wrote :

15:53 < ericm> lool, both vmlinuz and converted vmlinux worked

Amit Kucheria (amitk) on 2010-05-25
Changed in linux-ti-omap (Ubuntu):
assignee: Amit Kucheria (amitk) → Bryan Wu (cooloney)
Changed in linux-ti-omap (Ubuntu Lucid):
assignee: Amit Kucheria (amitk) → Bryan Wu (cooloney)
Bryan Wu (cooloney) on 2010-08-17
Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux-ti-omap (Ubuntu Lucid):
importance: Undecided → Medium
Changed in linux (Ubuntu):
assignee: nobody → Bryan Wu (cooloney)
Changed in linux-ti-omap (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu Lucid):
assignee: nobody → Bryan Wu (cooloney)
importance: Undecided → Medium
Bryan Wu (cooloney) wrote :

Tested kexec with Maverick kernel and Maverick rootfs. It failed due to after uncompressing the kernel and system hangs.

Firstly, I can reproduce the same kernel oops as bug #588243 when I try kexec to reboot the kernel.

Also I tested upstream linux-omap tree, kexec works fine. Maybe we need to backport some patches.

-Bryan

Bryan Wu (cooloney) wrote :

Tested on Lucid, kernel kexec works. but mounting root file system failed. I think kexec function is ok in Lucid kernel.

Loïc Minier (lool) on 2010-08-17
tags: added: armel

On Tue, Aug 17, 2010 at 6:11 PM, Bryan Wu <email address hidden> wrote:
> Tested on Lucid, kernel kexec works. but mounting root file system
> failed. I think kexec function is ok in Lucid kernel.
>

Some of the kexec patches are not merged, esp. those L2 disabling
ones. Check the latest mailing list, Thomax Gleixer is working on a
generic patch for all armv7 architectures, should possibly work for
both dove and omap4.

>
> ** Attachment added: "lucid_dmesg.log"
>   https://bugs.edge.launchpad.net/ubuntu/+source/linux-ti-omap/+bug/517841/+attachment/1494976/+files/lucid_dmesg.log
>
> --
> KEXEC support broken
> https://bugs.launchpad.net/bugs/517841
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Adana: Fix Released
> Status in “linux” package in Ubuntu: Confirmed
> Status in “linux-ti-omap” package in Ubuntu: Confirmed
> Status in “linux” source package in Lucid: Confirmed
> Status in “linux-ti-omap” source package in Lucid: Confirmed
>
> Bug description:
> Currently, KEXEC on Dove failed to reboot into a new kernel.
>
> This failed even with the patches to support kexec on v7 as below:
>
> http://lists.infradead.org/pipermail/linux-arm-kernel/2009-December/006431.html
>
> Some one, in the thread above, did report success case, so we might take a look.
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/adana/+bug/517841/+subscribe
>

Bryan Wu (cooloney) wrote :

Eric,

I just followed the a thread on linux-arm list: http://lists.infradead.org/pipermail/linux-arm-kernel/2010-September/025289.html

It looks like useful for us. But we still need some extra kexec patches which is not in 2.6.35 kernel release. Just a quick search, I found we need cherry-pick 4 more patches to make the 1st patch from Thomas Gleixer applied cleanly.

-Bryan

Paolo Pisati (p-pisati) wrote :

lucid is out of support, and latest P kernel have the necessary bits to make kexec work on omap3 (and UP omap4), closing here

Changed in linux (Ubuntu Lucid):
status: Confirmed → Won't Fix
Changed in linux-ti-omap (Ubuntu):
status: Confirmed → Won't Fix
Changed in linux-ti-omap (Ubuntu Lucid):
status: Confirmed → Won't Fix
Changed in linux (Ubuntu):
status: Confirmed → Won't Fix
To post a comment you must log in.