Ubuntu

libc incorrectly detects AVX support

Reported by Chris J Arges on 2012-04-11
42
This bug affects 6 people
Affects Status Importance Assigned to Milestone
eglibc (Ubuntu)
High
Adam Conrad
Declined for Natty by Chris J Arges
Lucid
High
Adam Conrad
Oneiric
High
Adam Conrad
Precise
High
Adam Conrad
Quantal
High
Adam Conrad

Bug Description

[Impact]
In processors with AVX support virtual machines running can cause the program to execute invalid opcodes, thus crashing a running program.

[Development Fix]
This has been fixed in eglibc in precise. It it present in Lucid, Natty and Oneiric.

[Stable Fix]
A fix can be backported from the cvs-avx-detection.diff patch present in the precise version. This is provided in the below debdiff.

[Test Case]
Please see how to reproduce.

[Regression Potential]
This patch affects amd64 versions of eglibc, and in particular processors that have the AVX extension. This patch adds more complete checks for AVX enablement.

--

* Description of the problem:

libc incorrectly detects if AVX is enabled. On processors with AVX support like the Xeon E31270, libc does not check sufficiently to determine if AVX is actually enabled. The problem is exhibited on virtual machines using the effected version of eglibc where the host machine is running Xen and has a AVX capable CPU.

This bugreport explains the problem well: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=649349

* Versions Affected:

The problem is in lucid, eglibc-2.11.1-0ubuntu7.10, x86_64.
The problem is also in current versions of eglibc available for Natty, Maverick and Oneiric.

The problem is patched upstream in debian unstable eglibc 2.13-22 which made it into precise eglibc 2.13-23ubuntu1:
https://launchpad.net/ubuntu/+source/eglibc/2.13-23ubuntu1

There is a patch backported for glibc 2.11 provided by avx-fix.patch here:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=646549

* How to reproduce:

The problem may only be reproducible on particular hardware that supports AVX being used as a Xen host.
It is reproducible when trying to start apache. The program exits when execution of an AVX instruction fails.

Start Apache on a Lucid guest VM where the host machine is running Xen and has a Xeon E31270 model cpu.

* What happens?
Apache exits with an error:
apache2[858] trap invalid opcode ip:7ffcebfdf920 sp:7fffc6da6798 error:0 in ld-2.11.1.so[7ffcebfca000+20000]

* What is expected?
Apache starts normally.

Chris J Arges (arges) wrote :

Attached is a patch that fixes this issue for lucid.

Chris J Arges (arges) on 2012-04-11
description: updated
Chris J Arges (arges) on 2012-04-11
Changed in eglibc (Ubuntu):
assignee: nobody → Chris J Arges (christopherarges)
Steve Langasek (vorlon) wrote :

Adam, would you mind chasing this up?

Changed in eglibc (Ubuntu):
assignee: Chris J Arges (christopherarges) → Adam Conrad (adconrad)
Adam Conrad (adconrad) wrote :

Will have a look at SRUing for this, sure.

Adam Conrad (adconrad) wrote :

Turns out that we've found still more issues with AVX detection upstream. Carlos is working on testing a refined patchset right now, and if that all goes smoothly, I'll be backporting this across the board.

Chris J Arges (arges) wrote :

@adam,
thanks

Micah Gersten (micahg) wrote :

Removing from sponsorship queue as Adam has taken ownership of this

Uli Stärk (uli-staerk) wrote :

Why is this taking so long? The 12.04 release is completely boken using the new amd servers. I would classify this as a major compatibility issue.

Adam Conrad (adconrad) wrote :

It's taken some time upstream to properly fix AVX and FMA4 issues (the last comment in this bug log is actually about FMA4, not AVX, though the two relate).

We have bits landed upstream for 2.15, which I'll be getting into quantal and precise ASAP, backporting to older releases will be a bit more challenging, but I'll jump on that next.

Chris J Arges (arges) on 2012-06-08
Changed in eglibc (Ubuntu Lucid):
status: New → Incomplete
status: Incomplete → Confirmed
importance: Undecided → High
Changed in eglibc (Ubuntu):
importance: Medium → High
status: Confirmed → In Progress
Changed in eglibc (Ubuntu Precise):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Adam Conrad (adconrad)
Changed in eglibc (Ubuntu Lucid):
assignee: nobody → Chris J Arges (christopherarges)
Chris J Arges (arges) on 2012-06-29
Changed in eglibc (Ubuntu Precise):
milestone: none → ubuntu-12.04.1
Nazar Mokrynskyi (nazar-pc) wrote :

I think this is the same problem as in https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/956051

In addition it affects 12.10 Quantal

Chris J Arges (arges) on 2012-08-09
Changed in eglibc (Ubuntu Oneiric):
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Chris J Arges (christopherarges)
Changed in eglibc (Ubuntu Precise):
milestone: ubuntu-12.04.1 → precise-updates
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eglibc - 2.15-0ubuntu16

---------------
eglibc (2.15-0ubuntu16) quantal; urgency=low

  * Backport fix from 2.16 to fix htons() conversion errors on non-x86
    architectures, by correctly casting to uint16_t (LP: #1016349)
  * Restore missing AT_EMPTY_PATH definition in fnctl.h (LP: #1010069)
  * Backport FMA4/AVX detection from glibc 2.16 (LP: #956051, #979003)
  * Backport fixups to AVX-using code to match the detection backport.
 -- Adam Conrad <email address hidden> Thu, 09 Aug 2012 15:15:53 -0600

Changed in eglibc (Ubuntu Quantal):
status: In Progress → Fix Released
Adam Stokes (adam-stokes) wrote :

Backport of quantal's AVX changes to precise

Chris J Arges (arges) wrote :

I've built a do-no-harm fix just for the original AVX bug using the patch identified in #1. If somebody can verify this on AMD hardware (with AVX extension) that would be much appreciated.

Here is the package in my PPA:
https://launchpad.net/~christopherarges/+archive/ppa-test/+sourcepub/2629659/+listing-archive-extra

Adam Conrad (adconrad) wrote :

To be clear, the precise fix for this has been sitting in the precise-proposed queue since August 10th, this isn't being ignored, just needs some review love before it can be accepted.

Hello Chris, or anyone else affected,

Accepted eglibc into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/eglibc/2.15-0ubuntu10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in eglibc (Ubuntu Precise):
status: In Progress → Fix Committed
tags: added: verification-needed
Chris J Arges (arges) on 2012-10-02
tags: added: verification-done
removed: verification-needed
Chris J Arges (arges) wrote :

On an AMD bulldozer/fma4:
I have verified the precise -proposed package 2.15-0ubuntu10.1 and it does fix the issue. I can use 'xm list', boot a lucid domU, and run apache.
I have also verified that the do-no-harm package in comment #12. It will allow me to use 'xm list' and boot a lucid domU machine and run apache as well.

Brian Murray (brian-murray) wrote :

Hello Chris, or anyone else affected,

Accepted eglibc into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/eglibc/2.15-0ubuntu10.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: removed: verification-done
tags: added: verification-needed
Adam Conrad (adconrad) wrote :

Verified that the precise-proposed packages do the right thing for AVX/FMA4 detection with third party testers.

tags: added: verification-done
removed: verification-needed

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Chris J Arges (arges) wrote :

Hi wondering if Lucid fix (identified in #1) could be uploaded, I verified this in comment #15 on an AMD machine with those extensions as well as an Intel machine.

Adam Conrad (adconrad) wrote :

This bug was fixed in the package eglibc - 2.15-0ubuntu10.3

---------------
eglibc (2.15-0ubuntu10.3) precise; urgency=low

  * Backport fixes for dbl-64 and ldbl-128 issues (LP: #1000498)
  * Backport another FMA support patch from glibc master branch.

eglibc (2.15-0ubuntu10.2) precise-security; urgency=low

  * SECURITY UPDATE: stack buffer overflow in vfprintf handling
    (LP: #1031301)
    - debian/patches/any/CVE-2012-3406.patch: switch to malloc when
      array grows too large to handle via alloca extension
    - CVE-2012-3406
  * SECURITY UPDATE: stdlib strtod integer/buffer overflows
    - debian/patches/any/CVE-2012-3480.patch: rearrange calculations
      and modify types to void integer overflows
    - CVE-2012-3480

eglibc (2.15-0ubuntu10.1) precise; urgency=low

  * Backport fix from 2.16 to fix htons() conversion errors on non-x86
    architectures, by correctly casting to uint16_t (LP: #1016349)
  * Restore missing AT_EMPTY_PATH definition in fnctl.h (LP: #1010069)
  * Backport FMA4/AVX detection from glibc 2.16 (LP: #956051, #979003)
  * Backport fixups to AVX-using code to match the detection backport.
  * Backport fix from 2.16 for sscanf/realloc deadlock (LP: #1028038)
  * Backport for bogus FPE on underflow for exp(double) (LP: #1007457)
 -- Adam Conrad <email address hidden> Wed, 03 Oct 2012 15:58:02 -0600

Changed in eglibc (Ubuntu Precise):
status: Fix Committed → Fix Released
Adam Conrad (adconrad) on 2012-11-15
Changed in eglibc (Ubuntu Oneiric):
assignee: Chris J Arges (christopherarges) → Adam Conrad (adconrad)
Changed in eglibc (Ubuntu Lucid):
assignee: Chris J Arges (christopherarges) → Adam Conrad (adconrad)

Hello Chris, or anyone else affected,

Accepted eglibc into oneiric-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/eglibc/2.13-20ubuntu5.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in eglibc (Ubuntu Oneiric):
status: Confirmed → Fix Committed
tags: removed: verification-done
tags: added: verification-needed
Brian Murray (brian-murray) wrote :

Hello Chris, or anyone else affected,

Accepted eglibc into lucid-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/eglibc/2.11.1-0ubuntu7.12 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in eglibc (Ubuntu Lucid):
status: Confirmed → Fix Committed
Bartosz Kosiorek (gang65) wrote :

I verified it on my Bulldozer processor, and it works perfectly without crash.

tags: added: verification-done-lucid
Chris J Arges (arges) wrote :

Verified this on a Sandybridge process with AVX extensions using a lucid domU. I had to install xen 4.2 to ensure I could see the avx extensions in the guest.

Chris J Arges (arges) wrote :

Verified this on a Sandybridge processor with AVX extensions using an oneiric domU, (had to boot with xen_emul_unplug=never). Using xen 4.2 to see proper avx extensions.

tags: added: verification-done-oneiric
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eglibc - 2.13-20ubuntu5.3

---------------
eglibc (2.13-20ubuntu5.3) oneiric; urgency=low

  * Pull three interdependent patches from Debian to fix AVX detection
    problems on kernels or CPUs that lack support for it (LP: #979003):
    - amd64/cvs-avx-detection.diff: Improved detection on old kernels.
    - amd64/cvs-dl_trampoline-cfi.diff: fix CFI in dl_trampoline code.
    - amd64/cvs-avx-osxsave.diff: Disable AVX without OSXAVE support.
 -- Adam Conrad <email address hidden> Wed, 14 Nov 2012 16:03:25 -0700

Changed in eglibc (Ubuntu Oneiric):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eglibc - 2.11.1-0ubuntu7.12

---------------
eglibc (2.11.1-0ubuntu7.12) lucid; urgency=low

  * Pull three interdependent patches from Debian to fix AVX detection
    problems on kernels or CPUs that lack support for it (LP: #979003):
    - amd64/cvs-avx-detection.diff: Improved detection on old kernels.
    - amd64/cvs-dl_trampoline-cfi.diff: fix CFI in dl_trampoline code.
    - amd64/cvs-avx-osxsave.diff: Disable AVX without OSXAVE support.
  * Also backport amd64/submitted-tst-audit6-avx.diff from oneiric to
    skip tests if AVX extensions are not available on the build host.
  * Use non-deprecated --reject-format=unified QUILT_PATCH_OPTS option.
 -- Adam Conrad <email address hidden> Wed, 14 Nov 2012 16:14:37 -0700

Changed in eglibc (Ubuntu Lucid):
status: Fix Committed → Fix Released
Justin Baugh (baughj-y) wrote :

This change really needs to be pushed to main; debootstrap and friends aren't smart enough to consider *-updates repositories, which (for instance) renders xen-create-image useless on a Precise system with a CPU that triggers the original bug.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers