Ubuntu16.10:installation fails on Brazos system (31TB and 192 cores) No memory for flatten_device_tree (no room)

Bug #1614309 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Tim Gardner
Xenial
Undecided
Tim Gardner
Yakkety
High
Tim Gardner

Bug Description

== Comment: #0 - Praveen K. Pandey - 2016-07-16 10:00:52 ==
Hi ,

  I tried ubuntu16.10 Installation in brazos (configuration 31TB memory and 192 cores) system . installation failing during prom_init when trying to allocate flatten_dt memory) System F/w is
FW860.00 (TC860_020).

Reproducible Step :

1- Configure Power VM system having profile with 31TB memory and 1 92 Core
2- Start Ubuntu16.10 Installation

Actual Result :

Installation drop in F/w Console as failing during prom_init

Expected Result :

Installation Should went through

Log:

 Booting a command list

OF stdout device is: /vdevice/vty@30000000nitrd.gz 34.84MiB 100% 3.01MiB/s ]
Preparing to boot Linux version 4.4.0-30-generic (buildd@bos01-ppc64el-023) (gcc version 5.3.1 20160413 (Ubuntu/IBM 5.3.1-14ubuntu2.1) ) #49-Ubuntu SMP Fri Jul 1 10:00:36 UTC 2016 (Ubuntu 4.4.0-30.49-generic 4.4.13)
Detected machine type: 0000000000000101
Max number of cores passed to firmware: 256 (NR_CPUS = 2048)
Calling ibm,client-architecture-support... done
command line: BOOT_IMAGE=/ubuntu-installer/ppc64el/yakkety/vmlinux tasks=standard pkgsel/language-pack-patterns= pkgsel/install-language-support=false DEBIAN_FRONTEND=text locale=en_US priority=low console-setup/ask_detect=false console-setup/layoutcode=us netcfg/disable_dhcp=true netcfg/confirm_static=true netcfg/choose_interface=6c:ae:8b:6a:81:98 netcfg/get_ipaddress=9.40.193.34/24 netcfg/get_gateway=9.40.193.1 netcfg/get_nameservers=9.3.1.200 netcfg/get_hostname=ltc-brazos1.aus.stglabs.ibm.com suite=yakkety scsi_mod.scan=sync sshd netcfg/get_domain=aus.stglabs.ibm.com unrestricted= url=tftp://9.3.80.16/tftpboot/ubuntu-installer/ppc64el/ubuntu-server.seed-01-6c-ae-8b-6a-81-98 anna/choose_modules=network-console network-console/start=continue network-console/password=passw0rd network-console/password-again=passw0rd
memory layout at init:
  memory_limit : 0000000000000000 (16 MB aligned)
  alloc_bottom : 000000000e3e0000
  alloc_top : 0000000010000000
  alloc_top_hi : 0000000010000000
  rmo_top : 0000000010000000
  ram_top : 0000000010000000
instantiating rtas at 0x000000000e9e0000... done
prom_hold_cpus: skipped
copying OF device tree...
Building dt strings...
Building dt structure...
No memory for flatten_device_tree (no room)
EXIT called ok
0 >

Regards
Praveen

== Comment: #8 - Sukadev Bhattiprolu - 2016-08-05 02:27:07 ==
Posted a couple of patches to the community for review:

http://marc.info/?l=linux-kernel&m=147037748624043&w=2
http://marc.info/?l=linux-kernel&m=147037768124064&w=2

== Comment: #12 - Sukadev Bhattiprolu - 2016-08-17 19:23:22 ==

Temporary fix to address the issue while the above patches are being reviewed.

==

Canonical,

Please add the attached temporary fix to 16.10 to address the following issue while the upstream changes are being considered.

   When booting a very large system with a large initrd we run out of space
   for the flattened device tree (FDT). To fix this we must increase the
   space allocated for the RMA region.

Revision history for this message
bugproxy (bugproxy) wrote : Patch to increase RMA size to 512MB

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-143820 severity-critical targetmilestone-inin1610
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → High
status: New → Triaged
Revision history for this message
Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Xenial):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Changed in linux (Ubuntu Yakkety):
assignee: Canonical Kernel Team (canonical-kernel-team) → Tim Gardner (timg-tpi)
status: Triaged → Fix Committed
Revision history for this message
bugproxy (bugproxy) wrote :

Default Comment by Bridge

Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Tim Gardner (timg-tpi) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-09-07 00:32 EDT-------
(In reply to comment #19)
> This bug is awaiting verification that the kernel in -proposed solves the
> problem. Please test the kernel and update this bug with the results. If the
> problem is solved, change the tag 'verification-needed-xenial' to
> 'verification-done-xenial'.
>
> If verification is not done by 5 working days from today, this fix will be
> dropped from the source code, and this bug will be closed.
>
> See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to
> enable and use -proposed. Thank you!

Hi canonical,

Thanks for Fix !!
need one help , Facing this issue while installation can you please help me how i enable proposed repo while installation .

Regards
Praveen

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-09-12 16:18 EDT-------
Hello Canonical, Our test team is reporting this bug is not fixed in -proposed.

Hi Gary,Kevin

seems me issue is not fixed in proposed repo as well.

what setup i did

1- setup netboot server using repo

"http://ports.ubuntu.com/ubuntu-ports/dists/yakkety-proposed/main/installer-ppc64el"

2- while installation pass boot pram as
apt-setup/proposed=true

Regards
Praveen

Hello Praveen,
Please attach the console log output as you did in the original comment of this bug so we can determine the kernel level you are booting.

Thanks, Gary

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-09-12 17:12 EDT-------
The current netboot image in proposed is pointing to:

20101020ubuntu473

But, this is from 8-26 and is still using the 4.4.0-9136.55 kernel:

Preparing to boot Linux version 4.4.0-9136-generic (buildd@bos01-ppc64el-024) (gcc version 5.4.1 20160803 (Ubuntu/IBM 5.4.1-1ubuntu2) ) #55-Ubuntu SMP Fri Aug 26 05:56:24 UTC 2016 (Ubuntu 4.4.0-9136.55-generic 4.4.16)
Detected machine type: 0000000000000101
Max number of cores passed to firmware: 256 (NR_CPUS = 2048)
Calling ibm,client-architecture-support... done
command line: BOOT_IMAGE=ubuntu-installer/ppc64el/vmlinux tasks=standard pkgsel/language-pack-patterns= pkgsel/install-language-support=false apt-setup/proposed=true --- quiet
memory layout at init:
memory_limit : 0000000000000000 (16 MB aligned)
alloc_bottom : 000000000e410000
alloc_top : 0000000010000000
alloc_top_hi : 0000000010000000
rmo_top : 0000000010000000
ram_top : 0000000010000000
instantiating rtas at 0x000000000e9e0000... done
prom_hold_cpus: skipped
copying OF device tree...
Building dt strings...
Building dt structure...
No memory for flatten_device_tree (no room)
EXIT called ok
0 >

Canonical,

Please clarify what you are asking to be tested for this install issue.

Thanks.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

You should be testing at least linux 4.4.0-38.57 in proposed.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-09-13 10:16 EDT-------
(In reply to comment #30)
> You should be testing at least linux 4.4.0-38.57 in proposed.

Hi Tim.

Do you have specific instructions on how to do so for the *install*? Since the netboot image is not using that kernel, the system doesn't make it far enough into the install process to pick up a non-install kernel like mentioned above.

Thanks

Revision history for this message
Tim Gardner (timg-tpi) wrote :

I guess you'll have to wait until linux 4.4.0-38.57 has been promoted to updates since the net boot images are not updated until then. I won't revert this patch for lack of verification. It is obviously powerpc specific and should not cause regressions.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-09-13 10:45 EDT-------
(In reply to comment #31)
> (In reply to comment #30)
> > You should be testing at least linux 4.4.0-38.57 in proposed.
>
> Hi Tim.
>
> Do you have specific instructions on how to do so for the *install*? Since
> the netboot image is not using that kernel, the system doesn't make it far
> enough into the install process to pick up a non-install kernel like
> mentioned above.
>
> Thanks

Hi Tim at Canonical,
I'd like to clarify.
This install bug occurs very early in the boot of the install image. We will need a netboot install image with the 4.4.0-38-57, or later, in order to test this fix.
Thanks, Gary

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-09-13 14:44 EDT-------
(In reply to comment #35)
> I guess you'll have to wait until linux 4.4.0-38.57 has been promoted to
> updates since the net boot images are not updated until then. I won't revert
> this patch for lack of verification. It is obviously powerpc specific and
> should not cause regressions.

Hi Tim,
Please let us know when a netboot image containing a kernel with these patches is available, be it 4.4, 4.6, 4.7, or 4.8.
Thanks, Gary

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (22.8 KiB)

This bug was fixed in the package linux - 4.4.0-38.57

---------------
linux (4.4.0-38.57) xenial; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1620658

  * CIFS client: access problems after updating to kernel 4.4.0-29-generic
    (LP: #1612135)
    - Revert "UBUNTU: SAUCE: (namespace) Bypass sget() capability check for nfs"
    - fs: Call d_automount with the filesystems creds

  * apt-key add fails in overlayfs (LP: #1618572)
    - SAUCE: overlayfs: fix regression in whiteout detection

linux (4.4.0-37.56) xenial; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1618040

  * [Feature] Instruction decoder support for new SKX instructions- AVX512
    (LP: #1591655)
    - x86/insn: perf tools: Fix vcvtph2ps instruction decoding
    - x86/insn: Add AVX-512 support to the instruction decoder
    - perf tools: Add AVX-512 support to the instruction decoder used by Intel PT
    - perf tools: Add AVX-512 instructions to the new instructions test

  * [Ubuntu 16.04] FCoE Lun not visible in OS with inbox driver - Issue with
    ioremap() call on 32bit kernel (LP: #1608652)
    - lpfc: Correct issue with ioremap() call on 32bit kernel

  * [Feature] turbostat support for Skylake-SP server (LP: #1591802)
    - tools/power turbostat: decode more CPUID fields
    - tools/power turbostat: CPUID(0x16) leaf shows base, max, and bus frequency
    - tools/power turbostat: decode HWP registers
    - tools/power turbostat: Decode MSR_MISC_PWR_MGMT
    - tools/power turbostat: allow sub-sec intervals
    - tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
    - tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
    - tools/power turbostat: re-name "%Busy" field to "Busy%"
    - tools/power turbostat: add --out option for saving output in a file
    - tools/power turbostat: fix compiler warnings
    - tools/power turbostat: make fewer systems calls
    - tools/power turbostat: show IRQs per CPU
    - tools/power turbostat: show GFXMHz
    - tools/power turbostat: show GFX%rc6
    - tools/power turbostat: detect and work around syscall jitter
    - tools/power turbostat: indicate SMX and SGX support
    - tools/power turbostat: call __cpuid() instead of __get_cpuid()
    - tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
    - tools/power turbostat: bugfix: TDP MSRs print bits fixing
    - tools/power turbostat: SGX state should print only if --debug
    - tools/power turbostat: print IRTL MSRs
    - tools/power turbostat: initial BXT support
    - tools/power turbostat: decode BXT TSC frequency via CPUID
    - tools/power turbostat: initial SKX support

  * [BYT] display hotplug doesn't work on console (LP: #1616894)
    - drm/i915/vlv: Make intel_crt_reset() per-encoder
    - drm/i915/vlv: Reset the ADPA in vlv_display_power_well_init()
    - drm/i915/vlv: Disable HPD in valleyview_crt_detect_hotplug()
    - drm/i915: Enable polling when we don't have hpd

  * [Feature]intel_idle enabling on Broxton-P (LP: #1520446)
    - intel_idle: add BXT support

  * [Feature] EDAC: Update driver for SKX-SP (LP: #1591815)
    - [Config] CONFIG_EDAC_SKX=m
    - EDAC, skx_edac: Ad...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-09-23 01:22 EDT-------
Hi ,

Verified this problem using http://ports.ubuntu.com/ubuntu-ports/dists/yakkety-proposed/main/installer-ppc64el/current/images/netboot/

this proposed repo and not able to produce this issue any more .

Thanks for Fix !!

Regards
Praveen

tags: added: verification-done verification-done-yakkety
removed: verification-needed-xenial
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-09-23 16:04 EDT-------
(In reply to comment #39)
> Hi ,
>
> Verified this problem using
> http://ports.ubuntu.com/ubuntu-ports/dists/yakkety-proposed/main/installer-
> ppc64el/current/images/netboot/
>
> this proposed repo and not able to produce this issue any more .
>
> Thanks for Fix !!
>
> Regards
> Praveen

Closing this bug on the IBM side based on the above comment.
Thanks, Gary

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers