amd64 x86-64 boot fails with more then 64 CPUs

Bug #706058 reported by Tim Gardner
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Tim Gardner
Lucid
Fix Released
Undecided
Tim Gardner
Maverick
Fix Released
Undecided
Tim Gardner
Natty
Fix Released
Undecided
Tim Gardner

Bug Description

Platforms with more then 64 CPUs fail to boot.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

SRU Justification

Impact: Servers with more then 64 logical CPUs cannot boot.

Patch Description: Increase CONFIG_NR_CPUS to 256 for the amd64 server flavour

The CPU hotplug subsytem allocates memory using the per-cpu mechanism. Enough memory is pre-defined by CONFIG_NR_CPUS to accommodate the maximum number of CPUs that can be online. This value is discovered at boot time and is used to trim the actual memory allocated to the real number of CPUs. In effect, CONFIG_NR_CPUS has little impact on memory consumption except for these 9 vairables:

kernel/cpu.c:static DECLARE_BITMAP(cpu_possible_bits, CONFIG_NR_CPUS) __read_mostly
kernel/cpu.c:static DECLARE_BITMAP(cpu_possible_bits, CONFIG_NR_CPUS) __read_mostly;
kernel/cpu.c:static DECLARE_BITMAP(cpu_online_bits, CONFIG_NR_CPUS) __read_mostly;
kernel/cpu.c:static DECLARE_BITMAP(cpu_present_bits, CONFIG_NR_CPUS) __read_mostly;
kernel/cpu.c:static DECLARE_BITMAP(cpu_active_bits, CONFIG_NR_CPUS) __read_mostly;
kernel/sched.c: DECLARE_BITMAP(cpus, CONFIG_NR_CPUS);
kernel/sched.c: DECLARE_BITMAP(span, CONFIG_NR_CPUS);
kernel/sched.c: static DECLARE_BITMAP(tmpmask, CONFIG_NR_CPUS);
mm/slub.c:static DECLARE_BITMAP(kmem_cach_cpu_free_init_once, CONFIG_NR_CPUS);

The increase in memory size of these variables is negligible since they are bit map structures. For example, increasing from 64 to 256 only adds 24 bytes per variable.

Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Natty):
status: New → Fix Committed
Changed in linux (Ubuntu Lucid):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Accepted linux into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in linux (Ubuntu Lucid):
status: In Progress → Fix Committed
Brad Figg (brad-figg)
tags: added: verification-needed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed' to 'verification-done'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Booted 2.6.35-26-server on a 4 way with more then 64 CPUs.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Accepted linux into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Steve Conklin (sconklin)
tags: added: verification-needed-lucid
Brad Figg (brad-figg)
tags: added: verification-done-maverick
removed: verification-done
Revision history for this message
Tim Gardner (timg-tpi) wrote :

Booted 2.6.32-29-server on a 4 way with more then 64 CPUs.

tags: added: verification-done-lucid
removed: verification-needed-lucid
Changed in linux (Ubuntu Maverick):
status: New → Fix Committed
assignee: nobody → Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Natty):
assignee: nobody → Tim Gardner (timg-tpi)
status: Fix Committed → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

Accepted linux into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (5.4 KiB)

This bug was fixed in the package linux - 2.6.32-29.58

---------------
linux (2.6.32-29.58) lucid-proposed; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #716551

  [ Upstream Kernel Changes ]

  * net: fix rds_iovec page count overflow, CVE-2010-3865
    - LP: #709153
    - CVE-2010-3865
  * net: ax25: fix information leak to userland, CVE-2010-3875
    - LP: #710714
    - CVE-2010-3875
  * net: ax25: fix information leak to userland harder, CVE-2010-3875
    - LP: #710714
    - CVE-2010-3875
  * net: packet: fix information leak to userland, CVE-2010-3876
    - LP: #710714
    - CVE-2010-3876
  * net: tipc: fix information leak to userland, CVE-2010-3877
    - LP: #711291
    - CVE-2010-3877
  * inet_diag: Make sure we actually run the same bytecode we audited,
    CVE-2010-3880
    - LP: #711865
    - CVE-2010-3880

linux (2.6.32-29.57) lucid-proposed; urgency=low

  [ Steve Conklin ]

  * Tracking Bug
    - LP: #708864

  [ Tim Gardner ]

  * [Config] Set CONFIG_NR_CPUS=256 for amd64 server
    - LP: #706058

  [ Upstream Kernel Changes ]

  * Input: i8042 - introduce 'notimeout' blacklist for Dell Vostro V13
    - LP: #380126
  * tun: avoid BUG, dump packet on GSO errors
    - LP: #698883
  * TTY: Fix error return from tty_ldisc_open()
    - LP: #705045
  * x86, hotplug: Use mwait to offline a processor, fix the legacy case
    - LP: #705045
  * fuse: verify ioctl retries
    - LP: #705045
  * fuse: fix ioctl when server is 32bit
    - LP: #705045
  * ALSA: hda: Use model=lg quirk for LG P1 Express to enable playback and
    capture
    - LP: #595482, #705045
  * nohz: Fix printk_needs_cpu() return value on offline cpus
    - LP: #705045
  * nohz: Fix get_next_timer_interrupt() vs cpu hotplug
    - LP: #705045
  * nfsd: Fix possible BUG_ON firing in set_change_info
    - LP: #705045
  * NFS: Fix fcntl F_GETLK not reporting some conflicts
    - LP: #705045
  * sunrpc: prevent use-after-free on clearing XPT_BUSY
    - LP: #705045
  * hwmon: (adm1026) Allow 1 as a valid divider value
    - LP: #705045
  * hwmon: (adm1026) Fix setting fan_div
    - LP: #705045
  * amd64_edac: Fix interleaving check
    - LP: #705045
  * IB/uverbs: Handle large number of entries in poll CQ
    - LP: #705045
  * PM / Hibernate: Fix PM_POST_* notification with user-space suspend
    - LP: #705045
  * ACPICA: Fix Scope() op in module level code
    - LP: #705045
  * ACPI: EC: Add another dmi match entry for MSI hardware
    - LP: #705045
  * orinoco: fix TKIP countermeasure behaviour
    - LP: #705045
  * orinoco: clear countermeasure setting on commit
    - LP: #705045
  * x86, amd: Fix panic on AMD CPU family 0x15
    - LP: #705045
  * md: fix bug with re-adding of partially recovered device.
    - LP: #705045
  * tracing: Fix panic when lseek() called on "trace" opened for writing
    - LP: #705045
  * x86, gcc-4.6: Use gcc -m options when building vdso
    - LP: #705045
  * x86: Enable the intr-remap fault handling after local APIC setup
    - LP: #705045
  * x86, vt-d: Handle previous faults after enabling fault handling
    - LP: #705045
  * x86, vt-d: Fix the vt-d fault handling irq migration in the x2apic mode
    - LP: #705045
  * x8...

Read more...

Changed in linux (Ubuntu Lucid):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (12.1 KiB)

This bug was fixed in the package linux - 2.6.35-27.48

---------------
linux (2.6.35-27.48) maverick-proposed; urgency=low

  [ Steve Conklin ]

  * Release Tracking Bug
    - LP: #723335

  [ Upstream Kernel Changes ]

  * thinkpad-acpi: avoid keymap pitfall
    - LP: #722747

linux (2.6.35-27.47) maverick-proposed; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #716532

  [ Upstream Kernel Changes ]

  * Revert "USB: gadget: Allow function access to device ID data during
    bind()"
    - LP: #714732
  * net: fix rds_iovec page count overflow, CVE-2010-3865
    - LP: #709153
    - CVE-2010-3865
  * Input: fix typo in keycode validation supporting large scancodes
    - LP: #658198
  * net: ax25: fix information leak to userland, CVE-2010-3875
    - LP: #710714
    - CVE-2010-3875
  * net: ax25: fix information leak to userland harder, CVE-2010-3875
    - LP: #710714
    - CVE-2010-3875
  * net: packet: fix information leak to userland, CVE-2010-3876
    - LP: #710714
    - CVE-2010-3876
  * net: tipc: fix information leak to userland, CVE-2010-3877
    - LP: #711291
    - CVE-2010-3877
  * posix-cpu-timers: workaround to suppress the problems with mt exec,
    CVE-2010-4248
    - LP: #712609
    - CVE-2010-4248
  * sys_semctl: fix kernel stack leakage, CVE-2010-4083
    - LP: #712749
    - CVE-2010-4083
  * thinkpad-acpi: lock down size of hotkey keymap
    - LP: #712174
  * thinkpad-acpi: add support for model-specific keymaps
    - LP: #712174
  * thinkpad-acpi: Add KEY_CAMERA (Fn-F6) for Lenovo keyboards
    - LP: #712174
  * x86, hotplug: Use mwait to offline a processor, fix the legacy case
    - LP: #714732
  * fuse: verify ioctl retries
    - LP: #714732
  * fuse: fix ioctl when server is 32bit
    - LP: #714732
  * ALSA: hda: Use position_fix=1 for Acer Aspire 5538 to enable capture on
    internal mic
    - LP: #685161, #714732
  * ALSA: hda: Use model=lg quirk for LG P1 Express to enable playback and
    capture
    - LP: #595482, #714732
  * drm/radeon/kms: don't apply 7xx HDP flush workaround on AGP
    - LP: #714732
  * drm/kms: remove spaces from connector names (v2)
    - LP: #714732
  * drm/radeon/kms: fix vram base calculation on rs780/rs880
    - LP: #714732
  * nohz: Fix printk_needs_cpu() return value on offline cpus
    - LP: #714732
  * nohz: Fix get_next_timer_interrupt() vs cpu hotplug
    - LP: #714732
  * nfsd: Fix possible BUG_ON firing in set_change_info
    - LP: #714732
  * NFS: Fix fcntl F_GETLK not reporting some conflicts
    - LP: #714732
  * sunrpc: prevent use-after-free on clearing XPT_BUSY
    - LP: #714732
  * hwmon: (adm1026) Allow 1 as a valid divider value
    - LP: #714732
  * hwmon: (adm1026) Fix setting fan_div
    - LP: #714732
  * EDAC: Fix workqueue-related crashes
    - LP: #714732
  * amd64_edac: Fix interleaving check
    - LP: #714732
  * ASoC: Fix swap of left and right channels for WM8993/4 speaker boost
    gain
    - LP: #714732
  * ASoC: Fix off by one error in WM8994 EQ register bank size
    - LP: #714732
  * ASoC: WM8580: Fix R8 initial value
    - LP: #714732
  * ASoC: fix deemphasis control in wm8904/55/60 codecs
    - LP: #714732
  * bootmem: Add alloc_bootmem_...

Changed in linux (Ubuntu Maverick):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.