kernel linux-lng-preempt-rt wont boot kvm guest when hugepages are enabled
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linaro-networking |
Fix Released
|
Critical
|
Kim Phillips |
Bug Description
when enabling hugepages in the kernel the kvm guest does not boot.
Mike Holmes (mike-holmes) wrote : | #1 |
Changed in linaro-networking: | |
status: | New → Triaged |
importance: | Undecided → Critical |
Mike Holmes (mike-holmes) wrote : | #2 |
There are two failure cases, this one has no initial network issue
http://
Kim Phillips (kim-phillips) wrote : | #3 |
perhaps that's because it's not supported yet?:
https:/
Zi Shen Lim (zlim) wrote : | #4 |
Is this (1) hugepage enabled in Guest kernel, (2) hugepage enabled in host kernel, or (3) both?
Anders Roxell (aroxell) wrote : | #5 |
when hugepages are enabled on the host.
if its not supported yet the linux-lng kernel should fail as well to brig up the kvm guest right?
but according to the log that works or?
Kim Phillips (kim-phillips) wrote : | #6 |
> when hugepages are enabled on the host.
what .conf to use to enable hugepages? I don't see any appropriate ones...do you do it manually? If so, what are the CONFIG_ symbols to set?
Also, is it known whether this still occurs with the 3.10.14 RT9 kernel (esp. the networking failure)?
Mike Holmes (mike-holmes) wrote : | #7 |
See https:/
Kim Phillips (kim-phillips) wrote : | #8 |
note: I'm not experiencing the udhcpc problem on my local board, but I am in the LAVA lab.
I traced the kernel where qemu was hanging, occupying 99% cpu to find __get_user_
##### CPU 0 buffer started ####
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
qemu-system-
Kim Phillips (kim-phillips) wrote : | #9 |
On another note, one of the KVM guys brought my attention to this build warning:
CC arch/arm/
arch/arm/
which appears to be cleared up with this:
http://
which suggests hugepage support was added in v3.11-rc1, which I think is a typo because this commit:
commit 1355e2a6eb88f04
Author: Catalin Marinas <email address hidden>
Date: Wed Jul 25 14:32:38 2012 +0100
ARM: mm: HugeTLB support for LPAE systems.
has a v3.10-rc3-
In looking into getting rid of the build warning by applying those two patches, I see:
commit 4bfab2034bab937
Author: Steven Capper <email address hidden>
Date: Fri Jul 26 14:58:22 2013 +0100
ARM: 7792/1: mm: Remove general hugetlb code from ARM
sitting in upstream Linus' tree, but no "ARM: mm: Remove HugeTLB warning from dma-mapping.
Maxim Uvarov (maxim-uvarov) wrote : | #10 |
This patch removes this warning:
http://
[2/2] ARM: mm: Remove HugeTLB warning from dma-mapping.c
Steve Capper (steve-capper) wrote : | #11 |
I've resent the HugeTLB warning fix patch just now:
http://
I am applying the finishing touches to the fast_gup patch for ARM. I expect to have some code available soon.
Steve Capper (steve-capper) wrote : | #12 |
Ok, I've put up a branch at:
https:/
for-lng/fast_gup is where I'll keep the implementation.
I've tagged the code to: fast_gup_
This works for me on my Arndale when I try a futex on THP tail.
I am going to read over this a bit more and then will submit the patches to lakml.
Please let me know if there are any problems with KVM.
Thanks,
--
Steve
Christoffer Dall (cdall) wrote : | #13 |
Just to clarify the KVM support for THP.
What we are talking about is host kernel support for THP and how that interacts with KVM.
There were bugs in the previous huge page patch, please keep an eye on kvm-arm-next the next few days, we will be merging a more well-tested and reviewed patch soon.
For the record, without the THP patches for KVM, if THP is enabled on the host kernel, guest memory may be backed by pages that linux groups as THPs, but the Stage-2 page tables would map the pages using 4K mappings. When the KVM THP patches are present the Stage-2 entries will be 2MB huge mappings.
Hope this clarifies things.
-Christoffer
Kim Phillips (kim-phillips) wrote : | #14 |
I seem to have found a potential fixto the problem in the LAVA lab.
I've since gone from 100% failure to 100% success, although I've only
tried two so far :) :
http://
http://
the change I made was to the bin/busybox.nosuid binary:
-udhcpc -R -n -p /var/run/
+udhcpc -t 10 -p /var/run/
that is, omit the:
-n,--now Exit with failure if lease is not immediately obtained
and jack up the retries parameter:
-t,--retries=N Send up to N request packets
(although I can't tell what the default retries is).
for more info on the parameters, go to
http://
The fix-vs.-workaround argument here is that the lab's DHCP server has
a much higher latency than local development systems. If that's
acceptable, we need to amend the rootfs build to perform the above
changes in the busybox configuration. Any tips on where that lives in
the massively overloaded meta-maze called OE would be appreciated.
If not, I'd like to request root access to the DHCP server for diagnostics.
Mike Holmes (mike-holmes) wrote : Re: [Bug 1234718] Re: kernel linux-lng-preempt-rt wont boot kvm guest when hugepages are enabled | #15 |
Matt has also reported sluggish dhcp response in the lng lab so we should
check if the lab server is performing aceptably.
On Oct 11, 2013 7:10 PM, "Kim Phillips" <email address hidden> wrote:
> I seem to have found a potential fixto the problem in the LAVA lab.
> I've since gone from 100% failure to 100% success, although I've only
> tried two so far :) :
>
> http://
>
> http://
>
> the change I made was to the bin/busybox.nosuid binary:
>
> -udhcpc -R -n -p /var/run/
> +udhcpc -t 10 -p /var/run/
>
> that is, omit the:
>
> -n,--now Exit with failure if lease is not immediately obtained
>
> and jack up the retries parameter:
>
> -t,--retries=N Send up to N request packets
>
> (although I can't tell what the default retries is).
>
> for more info on the parameters, go to
> http://
>
> The fix-vs.-workaround argument here is that the lab's DHCP server has
> a much higher latency than local development systems. If that's
> acceptable, we need to amend the rootfs build to perform the above
> changes in the busybox configuration. Any tips on where that lives in
> the massively overloaded meta-maze called OE would be appreciated.
>
> If not, I'd like to request root access to the DHCP server for
> diagnostics.
>
> --
> You received this bug notification because you are a member of Linaro
> Networking Group, which is subscribed to linaro-networking.
> Matching subscriptions: LNG all, all issues
> https:/
>
> Title:
> kernel linux-lng-
> enabled
>
> Status in Linaro networking Group:
> Triaged
>
> Bug description:
> when enabling hugepages in the kernel the kvm guest does not boot.
>
> To manage notifications about this bug go to:
> https:/
>
Mike Holmes (mike-holmes) wrote : | #16 |
This canoot be verified untill the udhcp issue in the lng lab is fixed
Changed in linaro-networking: | |
status: | Triaged → Fix Committed |
assignee: | nobody → Kim Phillips (kim-phillips) |
Mike Holmes (mike-holmes) wrote : | #17 |
Passes for non RT http://
Fails for RT http://
patch applied Oct 22 8.51 am https:/
image built 23-oct 00.59 https:/
Kim Phillips (kim-phillips) wrote : | #18 |
verify genuine failure due to specific removal of (THP) && !PREEMPT_RT_FULL config exception clause in LNG kernel in order to decide whether to switch from bug to effectively reopen card LNG-17, i.e., new work to provide rationale enough to submit patch upstream.
Kim Phillips (kim-phillips) wrote : | #19 |
this appears to be working now:
http://
Can anyone point to the original non-DHCP-related failure, if any? If none, this bug is just the same as all the other bugs suffering from DHCP problems, and should probably be closed/made duplicate of the bug that specifically targets the DHCP bug.
Mike Holmes (mike-holmes) wrote : | #20 |
Gary is monitoring this patch upstream and it has been applied to the current LNG kernel.
It will be closed when it comes back from upstream.
Mike Holmes (mike-holmes) wrote : | #21 |
Gary to check with Steve Capper, since it is not in 3.12, the next version LNG is moving to.
Gary S. Robertson (gary-robertson) wrote : | #22 |
Checked with Steve Capper and his updated patches are still in progress. He expects them to be finished sometime after the first of 2014. In the mean time we are porting his existing patches to the 3.12 LNG kernel and will revert these and replace them with the updated patches as those are made available.
Gary S. Robertson (gary-robertson) wrote : | #23 |
Need to check back with Steve Capper about latest patches. An intermediate version of the patches applied to the 3.10 kernel appeared to cause instability, so we are waiting on the official version to be available.
Gary S. Robertson (gary-robertson) wrote : | #24 |
Oops - cancel that last comment about the instability in the 3.10 kernel - that actually involved the NO_HZ patches rather than the THP patches. However we still need an update on the status of Capper's new patches.
Gary S. Robertson (gary-robertson) wrote : | #25 |
Steve Capper is actively working to get these patches accepted upstream, but is having to revise the patches to make them more palatable for the upstream maintainers. See Capper's comments in the email thread at:
https:/
Gary S. Robertson (gary-robertson) wrote : | #26 |
Steve Capper's latest comments for those who can't follow the link in the previous comment:
The fast_gup is still being actively worked on. I've summarised what's
going on in the following page:
https:/
Essentially I'm trying to grab some database performance data to
justify the patches because I'm getting raised eyebrows. I am swearing
at databases as we speak...
The hugetlb warning patch has had the commit log rewritten to try and
make it more palatable for upstream and I've sent off a V2 just now
(with you on CC).
Mike Holmes (mike-holmes) wrote : | #27 |
Still waiting on upstream
Gary S. Robertson (gary-robertson) wrote : | #29 |
Friday 07 February - Added Steve Capper's latest patches (written for the 3.13 kernel) to our staging 3.12 kernel. First patch of four did not apply cleanly but after a couple of tries I managed to adapt it. As written it caused the compiler to attempt to pull in an include file <asm>/perf_regs.h, which does not exist in the 3.12.9 kernel. After resolving this issue and applying all four patches successfully, attempted to build and test the resulting kernel.
Unfortunately the kernel build dies with an 'error 2' somewhere in a sub-make trying to build either fs/ext4/built-in.o or fs/built-in.o. I was unable to quickly determine the source or details of the error and consequently tabled further efforts to build with the patch. Maybe I can resume this effort after completing some other tasks needed for LCA14.
Mike Holmes (mike-holmes) wrote : | #30 |
Maybe Steve has a few cycles to help us back port it, It would be good to
get it onto 3.10 also that the Keystone can take advantage of it.
On 7 February 2014 19:33, Gary S. Robertson <email address hidden>wrote:
> Friday 07 February - Added Steve Capper's latest patches (written for
> the 3.13 kernel) to our staging 3.12 kernel. First patch of four did
> not apply cleanly but after a couple of tries I managed to adapt it. As
> written it caused the compiler to attempt to pull in an include file
> <asm>/perf_regs.h, which does not exist in the 3.12.9 kernel. After
> resolving this issue and applying all four patches successfully,
> attempted to build and test the resulting kernel.
>
> Unfortunately the kernel build dies with an 'error 2' somewhere in a
> sub-make trying to build either fs/ext4/built-in.o or fs/built-in.o. I
> was unable to quickly determine the source or details of the error and
> consequently tabled further efforts to build with the patch. Maybe I
> can resume this effort after completing some other tasks needed for
> LCA14.
>
> --
> You received this bug notification because you are a member of Linaro
> Networking Group, which is subscribed to linaro-networking.
> Matching subscriptions: LNG all, all issues
> https:/
>
> Title:
> kernel linux-lng-
> enabled
>
> Status in Linaro networking Group:
> Fix Committed
>
> Bug description:
> when enabling hugepages in the kernel the kvm guest does not boot.
>
> To manage notifications about this bug go to:
> https:/
>
Gary S. Robertson (gary-robertson) wrote : | #31 |
Since KVM guests are booting okay now I think we might close this bug and open a new one instead which is specific to the fact that the latest THP fast-gup patches break our 3.12 staging kernel build. Priority could be lowered as well since our existing patches seem to be doing okay for now.
Mike Holmes (mike-holmes) wrote : | #32 |
Closing as the original issue is fixed, a new related issue has been found https:/
Changed in linaro-networking: | |
status: | Fix Committed → Fix Released |
See these regression results for a comparison
http:// validation. linaro. org/dashboard/ filters/ ~aroxell/ linux-lng- arndale validation. linaro. org/dashboard/ filters/ ~aroxell/ linux-lng- preempt- rt-arndale
http://
Specifically http:// validation. linaro. org/dashboard/ attachment/ 489303/ view#L114