linux 2.6.24-28.75 breaks xen flavours (xen kernel bug: 'kernel BUG at /build/buildd/linux-2.6.24/debian/build/custom-source-xen/mm/memory.c:2704')

Bug #620994 reported by Volker on 2010-08-20
62
This bug affects 9 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Hardy
High
Stefan Bader

Bug Description

Binary package hint: linux-image-2.6.24-28-xen

I have a server with 8.04 64 Bit and Xen 3.2

Description: Ubuntu 8.04.4 LTS
Release: 8.04

If i boot the Server with the new xen-kernel (version 75) i get a kernelbug and the domU do not start. Also i am not able to do a normal reboot with the "reboot" command, because the machine hangs at "stopping openntpd".

kernel BUG at /build/buildd/linux-2.6.24/debian/build/custom-source-xen/mm/memory.c:2704!

With version 73 and previous versions of the kernel everything is fine.

CVE References

Volker (volker-reiss) wrote :
Volker (volker-reiss) wrote :
Volker (volker-reiss) wrote :
Volker (volker-reiss) wrote :
mks99 (launchpad-schoenhaber) wrote :

Same for me. DomUs don't start. Not even ps -ef will complete. It seems to hang when listing info about the xend process.
Going back to 2.6.24-28.73 fixes this.

Changed in linux (Ubuntu):
status: New → Confirmed
Loïc Minier (lool) on 2010-08-22
Changed in linux (Ubuntu):
importance: Undecided → Critical
Changed in linux (Ubuntu Hardy):
status: New → Confirmed
importance: Undecided → Critical
Changed in linux (Ubuntu):
status: Confirmed → Invalid
importance: Critical → Undecided
summary: - xen kernel bug: 'kernel BUG at /build/buildd/linux-2.6.24/debian/build
- /custom-source-xen/mm/memory.c:2704'
+ linux 2.6.24-28.75 breaks xen flavours (xen kernel bug: 'kernel BUG at
+ /build/buildd/linux-2.6.24/debian/build/custom-source-
+ xen/mm/memory.c:2704')
Loïc Minier (lool) wrote :

Please note:
- linux-image-xen are in universe
- this is a security update, hence it didn't go through -proposed
- the patches come from upstream

This change from debian/changelog is suspicious as it touches memory.c which is in the assertion reported in this bug:
  * OPENVZ: Fixup patches to memory.c and mlock.c
    - CVE-2010-2240

Loïc Minier (lool) wrote :

So this was uploaded on the evening of the 19th, was reported here the 20th; this didn't get much attention until today (22nd) after a post to the kernel-team mailing-list.

Given that -xen is in universe, I think this can wait until Monday.

Loïc Minier (lool) wrote :

I have mailed the uploader, and the security team, asking to comment.

Loïc Minier (lool) on 2010-08-22
Changed in linux (Ubuntu Hardy):
importance: Critical → High
Jamie Strandboge (jdstrand) wrote :

While the regression is unfortunate, I think this needs to be dealt with on Monday for the following reasons:
- this kernel flavor is in universe
- staff to properly handle the regression is not on hand but will be in less than 24 hours (indeed, probably 15)
- the suspect patch is for serious CVE with a known exploit and pulling the fix would likely affect more users than leaving the patch in

Jamie Strandboge (jdstrand) wrote :

I left one off:
- xen users have a workaround in that the can boot into the previous kernel until the proper fix is found

runout (office-runout) wrote :

how can i go back to version 2.6.24-28.73?

this version is not listed in aptitude
i can't find this kernel in /boot
the older version 2.6.24-27 does not boot domU:
     Error: Kernel image does not exist: /boot/vmlinuz-2.6.24-24-xen

server is 8.04, 64bit, xen 3.3

Stefan Bader (smb) wrote :

So the problem seems to be the following: For the security issue a guard page has been added. To prevent user-space effects mlock was changed to prevent the first page on a VM_GROWSDOWN (stack) vma to be excluded from being made present.
What was not expected is that apparently Xen user-space locks areas within the stack. This can cause the vma to be split. If that happens, the remaining vma.>vm_start is always equal to the start of locking, but it does not contain the guard page. Even worse, if the requested size of only one page, we end up calling make_pages_present() with start == end and trigger the BUG() check there.

This is being flawed in more recent kernels (Jaunty to Maverick) too. Though by another bug the effect is rather to accidentally map in the guard page (which would cause the stack to grow each time it gets mlocked) and potentially locking one page less than desired (see upstream discussion here: http://kerneltrap.org/mailarchive/linux-kernel/2010/8/22/4609662/thread). The patches mentioned in that discussion should be watched. Linus did not sound too confident of them.

For Hardy I am currently trying to get test kernels being created and will update this bug report when I got them uploaded to a public space.

Stefan Bader (smb) wrote :

Test kernels (64bit) are now uploaded to http://people.canonical.com/~smb/lp620994/ The 32bit versions will follow soon. Anybody affected by this bug, please test and give feedback here as soon as possible. Thanks.

Changed in linux (Ubuntu Hardy):
assignee: nobody → Stefan Bader (stefan-bader-canonical)
status: Confirmed → In Progress
Jamie Strandboge (jdstrand) wrote :

Launchpad deleted 2.6.24-28.73 from https://launchpad.net/ubuntu/+source/linux/+publishinghistory, so for people who do not have the earlier kernel anymore, I have made it available in the ubuntu-security-proposed PPA at:
https://launchpad.net/~ubuntu-security-proposed/+archive/ppa

WARNING: this is not actually a proposed kernel (contrary to the name of the PPA) but rather the exact kernel that was published in USN-966-1, copied from the ubuntu-security PPA. Put simply, this does not have the xen regression, but also doesn't have any of the fixes in USN-974-1. Checksums can be found here (the original announcement for USN-966-1): https://lists.ubuntu.com/archives/ubuntu-security-announce/2010-August/001134.html

mks99 (launchpad-schoenhaber) wrote :

I did some (very quick) testing of Stefan's 77~pre1 kernel and didn't encounter the kernel bug.
xend started normally, a DomU could be started without problems.
AFAICT at this time, the problem seems to be fixed.

Stefan Bader (smb) wrote :

Looking at some test results provided in IRC, the first approach maybe is not sufficient. I currently only handle the case when the vma gets split by the current call to mlock_fixup(). But once a split is done locking within this vma again would likely be considered as touching the guard page.
The only real way to prevent this would require to find out whether there is a vma for an address before the requested start, whether this ends at the beginning of the current vma and whether that vma is a stack vma, too. This would require another call to find_vma() which is slow. Upstream changed the vma code to have a double linked list instead of a single chain and by that preventing another call to find_vma() when setting up the guard page. But I am not sure we really want to backport this change.
There might be another solution, which is to simply rip out the special handling in mlock. This however allows to potentially call mlock with an address space that includes the guard page and by that would cause the stack to grow with mlock. This normally should not happen as the guard page is hidden from the address range in proc. Still it might leave a way for evil code.

I am currently compiling a version 2 of test kernels which remove the guard page handling. And I would go and implement the guard page checking which uses find_vma(). Even being slower this sounds like a safer option than backporting the double linked list change.

Jamie Strandboge (jdstrand) wrote :

I have reproduced the problem on i386 hardy install. Attached is my trace with the .75 kernel. It is slightly different from the original reporter's, but similar.

Jamie Strandboge (jdstrand) wrote :

Stefan has put test kernels in http://people.canonical.com/~smb/lp620994/. Can people try the xen 2.6.24-28.77~pre6 kernel and report back how it works for them?

Jamie Strandboge (jdstrand) wrote :

I tested i386 with the following, and everything seems to work (with /lib/tls moved and not moved to /lib/tls.disabled):
- xen-create-image
- xm create
- xm list
- xm console
- installing package via apt-get within a guest
- ssh access into the guest
- xm shutdown
- xm destroy
- save/restore via reboot with guest running (apache was still listening)

Lesław Kopeć (feydiunn) wrote :

The xen 2.6.24-28.77~pre6 kernel works for me. All DomUs are booting up fine, daemons are up and running and I don't see any suspicious messages in logs. Thanks!

jmedina (jorgearma1982) wrote :

I tested 2.6.24-28.77~pre6 on a few test amd64 machines and both dom0 and domu works fine, domus, are i386 and amd64, I will do more tests on more servers this weekend.

Thanks Stefan!!!

Jamie Strandboge (jdstrand) wrote :

I have copied 2.6.24-28.77 for hardy to the ubuntu-security-proposed PPA. Can people affected by this bug please test these packages and report back how it works for them?

Jamie Strandboge (jdstrand) wrote :

I forgot to mention that this will be available after the next publishing run, which should be about an hour. Assuming these work ok for people, they will be the packages pushed to hardy-security.

Jamie Strandboge (jdstrand) wrote :

Unfortunately I made a mistake in copying the kernel to the ubuntu-security-proposed ppa. For reasons pertaining to Launchpad that I won't get into here, I have instead copied the 2.6.24-28.77 kernel for hardy to my PPA at https://launchpad.net/~jdstrand/+archive/ppa. Again, assuming these work ok for people, they will be the exact packages pushed to hardy-security. Sorry for the inconvenience.

Lesław Kopeć (feydiunn) wrote :

No problems running amd64 domains inside amd64 Dom0 with the kernel from Jamie's PPA.

ake sandgren (ake-sandgren) wrote :

The kernel from Jamie's PPA fixes the problems i had with pbs_mom startup.
(amd64 linux-image-2.6.24-28-server)

Stefan Bader (smb) wrote :

Changes committed to Hardy repo (need to get merged back to master branch when released)

Changed in linux (Ubuntu Hardy):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.24-28.77

---------------
linux (2.6.24-28.77) hardy-security; urgency=low

  [Stefan Bader]

  * mm: Use helper to find real vma with stack guard page
    - LP: #620994
    - CVE-2010-2240
  * mm: Do not assume ENOMEM when looking at a split stack vma
    - LP: #620994
    - CVE-2010-2240
 -- Stefan Bader <email address hidden> Wed, 25 Aug 2010 12:54:28 +0000

Changed in linux (Ubuntu Hardy):
status: Fix Committed → Fix Released
Adam Porter (alphapapa) wrote :
Download full text (7.6 KiB)

Just FYI, this botched security patch did not only affect Xen users. It made it impossible for me to run KeePassX, causing a kernel bug, and prevented me from safely rebooting my laptop.

Aug 23 12:33:17 kubbie kernel: [ 719.266981] ------------[ cut here ]------------
Aug 23 12:33:17 kubbie kernel: [ 719.266986] kernel BUG at /build/buildd/linux-2.6.24/mm/memory.c:2667!
Aug 23 12:33:17 kubbie kernel: [ 719.266988] invalid opcode: 0000 [#1] SMP
Aug 23 12:33:17 kubbie kernel: [ 719.266990] Modules linked in: battery ac button tg3 usblp nvidia(P) snd_rtctimer binfmt_misc rfcomm l2cap vboxnetadp vboxnetflt vboxdrv kvm_intel kvm kqemu ppdev ipv6 container dock sbs sbshc acpi_cpufreq cpufreq_conservative cpufreq_userspace cpufreq_stats cpufreq_ondemand freq_table cpufreq_powersave af_packet iptable_filter ip_tables x_tables ext2 aes_i586 dm_crypt coretemp sbp2 parport_pc lp parport loop snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_hwdep snd_seq_dummy arc4 snd_seq_oss ecb blkcipher snd_seq_midi snd_rawmidi snd_seq_midi_event iwl4965 snd_seq iwlcore lbm_iwl_mac80211 rfkill snd_timer snd_seq_device hci_usb led_class joydev bluetooth snd lbm_iwl_cfg80211 sdhci serio_raw ricoh_mmc wmi_acer intel_agp dcdbas iTCO_wdt video output mmc_core i2c_core agpgart shpchp pci_hotplug iTCO_vendor_support evdev soundcore psmouse pcspkr ext3 jbd mbcache sr_mod cdrom sg ata_generic sd_mod usbhid hid ata_piix ahci pata_acpi libata ohci1394 scsi_mod ieee13
Aug 23 12:33:17 kubbie kernel: 4 ehci_hcd uhci_hcd usbcore dm_mirror dm_snapshot dm_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse
Aug 23 12:33:17 kubbie kernel: [ 719.267052]
Aug 23 12:33:17 kubbie kernel: [ 719.267054] Pid: 21939, comm: keepassx Tainted: P (2.6.24-28-generic #1)
Aug 23 12:33:17 kubbie kernel: [ 719.267056] EIP: 0060:[make_pages_present+0x91/0xa0] EFLAGS: 00010246 CPU: 0
Aug 23 12:33:17 kubbie kernel: [ 719.267061] EIP is at make_pages_present+0x91/0xa0
Aug 23 12:33:17 kubbie kernel: [ 719.267062] EAX: deb67a50 EBX: bff6f000 ECX: 00100173 EDX: ffffffff
Aug 23 12:33:17 kubbie kernel: [ 719.267064] ESI: bff6f000 EDI: bff6f000 EBP: 00000100 ESP: dcff3f34
Aug 23 12:33:17 kubbie kernel: [ 719.267066] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Aug 23 12:33:17 kubbie kernel: [ 719.267068] Process keepassx (pid: 21939, ti=dcff2000 task=dea8c000 task.ti=dcff2000)
Aug 23 12:33:17 kubbie kernel: [ 719.267069] Stack: 00102173 c018064e 000bfffe deb67a50 dc729e18 dc729e18 ffffffff c017f119
Aug 23 12:33:17 kubbie kernel: [ 719.267074] 00000000 00102173 dd61e580 00000000 000bfffe 00000000 dd625e00 bff70000
Aug 23 12:33:17 kubbie kernel: [ 719.267078] bff6f000 00000001 bff6e000 c017f2cc bff6f000 00102173 bff6f000 dc729e18
Aug 23 12:33:17 kubbie kernel: [ 719.267082] Call Trace:
Aug 23 12:33:17 kubbie kernel: [ 719.267090] [split_vma+0xce/0xe0] split_vma+0xce/0xe0
Aug 23 12:33:17 kubbie kernel: [ 719.267104] [mlock_fixup+0xb9/0x130] mlock_fixup+0xb9/0x130
Aug 23 12:33:17 kubbie kernel: [ 719.267123] [do_mlock+0xac/0xe0] do_mlock+0xac/0xe0
Aug 23 12:33:17 kubbie kernel: [ 719.267137] [sys_mlock+0xc7/0xd0] sys_mlock+0xc7/0...

Read more...

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers