xenial: invalid opcode when using llvmpipe

Bug #1564156 reported by Jason Gerard DeRose on 2016-03-30
54
This bug affects 8 people
Affects Status Importance Assigned to Milestone
OEM Priority Project
Undecided
Unassigned
Release Notes for Ubuntu
Undecided
Unassigned
System76
Critical
Jason Gerard DeRose
llvm-toolchain-3.8 (Ubuntu)
Critical
Timo Aaltonen
Xenial
Critical
Timo Aaltonen
mesa (Ubuntu)
Undecided
Unassigned
Xenial
Undecided
Unassigned

Bug Description

[Description updated to reflect state of 16.04 release ISO]

== In summary ==

If you have an Intel Skylake (6th gen) CPU and an NVIDIA GPU (or possibly other GPUs that likewise require use of the llvmpipe opengl software fallback), a work-around is needed to install Ubuntu 16.04 desktop.

To work-around this, you'll need to:

1) Choose "Install Ubuntu" in the pre-boot menu (rather than "Try Ubuntu without installing")

2) Check "Download updates while installing Ubuntu"

** Note: "Download updates while installing Ubuntu" doesn't currently seem to be working. If after installing 16.04 on effected Skylake hardware you find that Unity/Compiz is broken, switch to a VT with Control+Alt+F1, login, and then run:

sudo apt-get update
sudo apt-get dist-upgrade

You should see the `libllvm3.8` package get updated. After a reboot, Unity/Compiz should be working.

== In detail ==

The Ubuntu 16.04 desktop ISOs include libllvm3.8 1:3.8-2ubuntu1, which has a bug that results in invalid JIT code generation when using the mesa llvmpipe opengl software fallback on Skylake CPUs.

When you encounter this bug, Unity/Compiz will fail to start, and you'll see something like this in dmesg:

[ 2092.557913] traps: compiz[10155] trap invalid opcode ip:7efc940030d4 sp:7ffccd914ea0 error:0

libllvm3.8 1:3.8-2ubuntu3 fixes this issue, but the fix did not make it onto the 16.04 release ISOs. It will be included on the 16.04.1 ISOs.

description: updated
Changed in system76:
status: New → Triaged
assignee: nobody → Jason Gerard DeRose (jderose)
importance: Undecided → Critical
description: updated
Timo Aaltonen (tjaalton) wrote :

moving to mesa until it's better known, might be triggered by llvm 3.8 and related to bug #1553174

affects: compiz (Ubuntu) → mesa (Ubuntu)
Jason Gerard DeRose (jderose) wrote :

So there seems to be a curious pattern with this bug (which might provide an important hint): on desktop systems with 970 GTX or 980 GTX cards, this bug always seems to happen when connected to a monitor over DisplayPort, but does *not* seem to happen when connected to a monitor over HDMI.

We'll investigate this more carefully as soon as there's a new Xenial daily build available (with the 4.4.0-18-generic kernel).

Jason Gerard DeRose (jderose) wrote :

Oh, and another bit of information: our laptops all use embedded Display Port for their connection to the internal screen, and in this case, this bug always manifests.

So at least in our testing so far, both eDP and DP seem to be effected.

Chih-Hsyuan Ho (chih) on 2016-04-14
affects: mesa (Ubuntu) → oem-priority
description: updated
Steve Langasek (vorlon) wrote :

Have discussed on IRC whether mesa should be rebuilt with llvm-3.6 as a workaround, as suggested. This is not a straight rebuild with an older toolchain; the upgrade to llvm-3.8 was needed for OpenGL 4.1 support (LP: #1535500), so this would be a behavior regression and invalidate testing since this change was made in xenial.

And that change was made in xenial at the beginning of February (Feb 9), so has been baking in xenial for 2 months without prior reports of this problem. So it seems likely that we'll be able to pin this on a later change in mesa, rather than just on llvm-3.8. This may ultimately be an llvm-3.8 bug, but we may be able to identify a smaller workaround.

Timo Aaltonen (tjaalton) wrote :

this happens with 11.1.x + llvm-3.8 too, so the bug should be in llvm as any mesa bisect attempt hasn't helped

no longer affects: mesa
Timo Aaltonen (tjaalton) wrote :

similar segfaults happen with mesa llvmpipe tests when built on skylake, but not with llvm-3.9 snapshot. So the issue is with detecting skylake features, I'll try disabling AVX512 from llvm if that works around this issue.

It doesn't explain failure on HSW-E though, maybe it's trying to enable another feature that isn't there?

Timo Aaltonen (tjaalton) wrote :

just dropping AVX512 didn't fix skylake, but looking at current lib/Target/X86/X86.td there are several features getting enabled which are separated in 3.9 to apply only to server CPU's

Changed in llvm-toolchain-3.8 (Ubuntu):
assignee: nobody → Timo Aaltonen (tjaalton)
status: New → In Progress
Ian Santopietro (isantop) wrote :

Attaching CPU info from two affected Desktop systems.

Ian Santopietro (isantop) wrote :

Second affected system

Timo Aaltonen (tjaalton) wrote :

this should be fixed with libllvm3.8 available on http://koti.kapsi.fi/~tjaalton/skl/build2
which dropped AVX512 and it's subvariants from skylake

but the haswell thing is probably something else..

Jason Gerard DeRose (jderose) wrote :

I can confirm that tjaalton's above test packages fix the problem on a Skylake laptop with an i7-6700 CPU and an Nvidia 970m GPU when using the nouveau driver.

I installed libllvm3.8_3.8-2ubuntu1.1_amd64.deb from a VT then rebooted, and now I have working Unity again.

Jason Gerard DeRose (jderose) wrote :

After further investigating, it seems this bug doesn't effect Haswell-E after all, sorry about the confusion.

Will continue to report back as we learn more.

Hello Jason, or anyone else affected,

Accepted llvm-toolchain-3.8 into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/llvm-toolchain-3.8/1:3.8-2ubuntu3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in llvm-toolchain-3.8 (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed
Jason Gerard DeRose (jderose) wrote :

Just confirmed on Skylake hardware that llvm-toolchain-3.8 from proposed fixes this issue on Skylake hardware using llvmpipe.

Thanks!

Changed in system76:
status: Triaged → Fix Committed
tags: added: verification-done
removed: verification-needed
description: updated
Changed in ubuntu-release-notes:
status: New → Fix Released
Joakim Koed (vooze) wrote :

Hi, glad to see this has been solved. How long till I can find it in updates? Can I add xenial-proposed during install?

Ara Pulido (apulido) on 2016-04-22
Changed in oem-priority:
status: New → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package llvm-toolchain-3.8 - 1:3.8-2ubuntu3

---------------
llvm-toolchain-3.8 (1:3.8-2ubuntu3) xenial; urgency=medium

  * rules: Allow LLVM tests to fail on i386, we know dropping AVX512 breaks
    them.

llvm-toolchain-3.8 (1:3.8-2ubuntu2) xenial; urgency=medium

  * drop-avx512-from-skylake.diff: Don't enable AVX512 on Skylake, as it's
    a server cpu feature and breaks llvmpipe on workstations. (LP: #1564156)

 -- Timo Aaltonen <email address hidden> Thu, 21 Apr 2016 08:16:04 +0300

Changed in llvm-toolchain-3.8 (Ubuntu):
status: Fix Committed → Fix Released
Joakim Koed (vooze) wrote :

Also just tried with the .deb from http://koti.kapsi.fi/~tjaalton/skl/build2 worked perfectly.

To future users: add nomodeset and then use wget to get it and install it that way. Look me a while to figure out why tty1-6 was not working ;)

Joel Heinzel (joelheinzel) wrote :

Any chance on providing more instructions on how to resolve?

I thought I had installed the above deb successfully, but it didn't seem to make a difference. Most likely I've not done something right.

Many thanks!

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mesa (Ubuntu Xenial):
status: New → Confirmed
Changed in mesa (Ubuntu):
status: New → Confirmed
Joakim Koed (vooze) wrote :

Joel, I did this:

boot with nomodeset (press E after grub is booted and add nomodeset after: quiet splash and press F10 (if I remember correctly) now change to a terminal with CTRL + ALT + F2 or something.

wget http://koti.kapsi.fi/~tjaalton/skl/build2/libllvm3.8_3.8-2ubuntu1.1_amd64.deb
sudo dpkg -i libllvm3.8_3.8-2ubuntu1.1_amd64.deb
sudo reboot.

Joel Heinzel (joelheinzel) wrote :

Thank you Joakim!

Changed in llvm-toolchain-3.8 (Ubuntu):
importance: Undecided → Critical
Changed in llvm-toolchain-3.8 (Ubuntu Xenial):
importance: Undecided → Critical

HI!

Would this bug cause my system to crash when using an external monitor via HDMI? I can attach and use the monitor no problem UNTIL I suspend and/or detach and reattach the monitor, in which case the whole system crashes i.e.) screen freezes, keyboard, touchpad and network are all unresponsive and I have no choice but to hard boot.

I've installed all the toolchain debs found here: http://koti.kapsi.fi/~tjaalton/skl/build2 and my system still crashes with the external monitor (as described above).

Some system specs:

Asus Zenbook UX305CA -EHM1
CPU: Intel® Core™ m3-6Y30 CPU @ 0.90GHz × 4
GPU: Intel® HD Graphics 515 (Skylake GT2)

Thanks!

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package llvm-toolchain-3.8 - 1:3.8-2ubuntu3

---------------
llvm-toolchain-3.8 (1:3.8-2ubuntu3) xenial; urgency=medium

  * rules: Allow LLVM tests to fail on i386, we know dropping AVX512 breaks
    them.

llvm-toolchain-3.8 (1:3.8-2ubuntu2) xenial; urgency=medium

  * drop-avx512-from-skylake.diff: Don't enable AVX512 on Skylake, as it's
    a server cpu feature and breaks llvmpipe on workstations. (LP: #1564156)

 -- Timo Aaltonen <email address hidden> Thu, 21 Apr 2016 08:16:04 +0300

Changed in llvm-toolchain-3.8 (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for llvm-toolchain-3.8 has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

description: updated
Changed in system76:
status: Fix Committed → Fix Released

I am trying to boot from a 16.04 live USB and it is hanging up when I say "Install Ubuntu" or "Try Ubuntu without installing". Will there be a new ISO released with the fix?

Jason Gerard DeRose (jderose) wrote :

Abhinav,

Sounds like you're encountering a different bug, or perhaps your ISO file is corrupted. Have you verified the checksum of your download?

description: updated
Jason Gerard DeRose (jderose) wrote :

Hmm, I just tried this on Skylake hardware, and "Download updates while installing" isn't doing the trick.

Could be that "Download updates while installing" is broken on the 16.04 ISOs, or this could be a result of the phased updates done by the Ubuntu Software Updater and related components that use the same code paths.

So after installing Ubuntu 16.04, Unity/Compiz was broken. I had to switch to a VT with Control+Alt+F1, login, and then run:

sudo apt-get update
sudo apt-get dist-upgrade

Which gave me the updated libllvm3.8 package. Then after a reboot, everything is shiny and working.

description: updated
Srecko Toroman (sreckotoroman) wrote :

Unfortunately, this broke my installation completely. Chroot-ing into the broken installation and running the updates fixed my problem. Here's the walk-through if anyone is still stuck: https://toroman.wordpress.com/2016/04/26/getting-out-of-a-broken-ubuntu-16-04-system-after-an-upgrade/

Jason, I checked the sha256 sum of the ISO. It is correct. I also tried setting nomodeset in the live USB and then booting. It didn't hang then but nothing really happened, it didn't start the installation. I could see the screen and move the mouse around but no option to start the installation.

I switched to tty1 and saw dmesg (I don't quite remember if it was dmesg or something else) but I saw the messages: "invalid opcode when using llvmpipe".

Jason Gerard DeRose (jderose) wrote :

Okay, seems that "Download updates while installing" has no effect with the Ubiquity version on the 16.04 ISOs.

I filed a bug against Ubiquity for this, would appreciate if someone could confirm my findings:

https://bugs.launchpad.net/ubuntu/+source/ubiquity/+bug/1580232

Ara Pulido (apulido) on 2016-05-17
Changed in oem-priority:
status: Fix Committed → Fix Released
Jason Gerard DeRose (jderose) wrote :

On a whim, I just checked in on this with the 20160601 Xenial daily amd64 ISO (sha1sum e07c8b4df1fc71a487fafb309bd318041a65774f), and everything works great:

http://cdimages.ubuntu.com/xenial/daily-live/pending/

So seems things are on track for this not being a problem in the 16.04.1 release.

System76 (salmon76) on 2016-11-09
Changed in mesa (Ubuntu Xenial):
status: Confirmed → Invalid
Changed in mesa (Ubuntu):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers