[Feature] Intel new CPU microcode 20140913

Bug #1370352 reported by Yingying Zhao on 2014-09-17
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
intel
Undecided
Unassigned
intel-microcode (Ubuntu)
Undecided
Chris J Arges
Trusty
Undecided
Unassigned
Utopic
Undecided
Tim Gardner

Bug Description

The new CPU microcode includes some fixes on Haswell platforms.

Please download from: http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=24290

Please consider to upload the new version to Ubuntu 14.10 and 14.04.

Tim Gardner (timg-tpi) on 2014-09-17
information type: Proprietary → Public
Changed in intel-microcode (Ubuntu Trusty):
status: New → In Progress
assignee: nobody → Tim Gardner (timg-tpi)
Tim Gardner (timg-tpi) on 2014-09-17
Changed in intel-microcode (Ubuntu Utopic):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Tim Gardner (timg-tpi) on 2014-09-17
Changed in intel-microcode (Ubuntu Utopic):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package intel-microcode - 2.20140913.1ubuntu1

---------------
intel-microcode (2.20140913.1ubuntu1) utopic; urgency=medium

  * Fixes for Haswell platforms
    -LP: #1370352
 -- Tim Gardner <email address hidden> Wed, 17 Sep 2014 08:03:41 -0600

Changed in intel-microcode (Ubuntu Utopic):
status: Fix Committed → Fix Released
Tim Gardner (timg-tpi) on 2014-09-17
Changed in intel-microcode (Ubuntu Trusty):
status: In Progress → Fix Committed

Hello Yingying, or anyone else affected,

Accepted intel-microcode into trusty-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/intel-microcode/2.20140913-t-1ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-needed
Felix Geyer (debfx) wrote :

Please revert the 2.20140913.1ubuntu1 upload.

After loading the new microcode lots of processes die with
[ 43.611507] traps: systemd[1] trap invalid opcode ip:7f844f84a7ab sp:7fff2ccf7e28 error:0 in libpthread-2.19.so[7f844f839000+18000]
[ 44.201798] traps: dbus-daemon[1277] trap invalid opcode ip:7f848d2f67ab sp:7fff8f3c3bb8 error:0 in libpthread-2.19.so[7f848d2e5000+18000]
[ 44.202077] traps: systemd-logind[1287] trap invalid opcode ip:7f8c4bf887ab sp:7fff58f13178 error:0 in libpthread-2.19.so[7f8c4bf77000+18000
[ 44.202101] traps: thermald[1269] trap invalid opcode ip:7fdf60f0c7ab sp:7fffa55068e8 error:0 in libpthread-2.19.so[7fdf60efb000+18000]

It's fine after a reboot so it seems to be a problem with upgrading from the 2.20140624.1ubuntu1 microcode to 2.20140913.1ubuntu1.

% cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz
stepping : 3
microcode : 0x1c
cpu MHz : 800.000
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
bogomips : 6385.61
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
[...]

Felix Geyer (debfx) wrote :

Full dmesg output:
[ 43.606830] microcode: CPU0 sig=0x306c3, pf=0x2, revision=0x1a
[ 43.608466] microcode: CPU0 updated to revision 0x1c, date = 2014-07-03
[ 43.608494] microcode: CPU1 sig=0x306c3, pf=0x2, revision=0x1a
[ 43.609327] microcode: CPU1 updated to revision 0x1c, date = 2014-07-03
[ 43.609352] do_trap: 267 callbacks suppressed
[ 43.609354] traps: rs:main Q:Reg[1343] trap invalid opcode ip:7f32abd0b7ab sp:7f32a9062848 error:0
[ 43.609355] microcode: CPU2 sig=0x306c3, pf=0x2, revision=0x1a
[ 43.609358] in libpthread-2.19.so[7f32abcfa000+18000]
[ 43.610204] microcode: CPU2 updated to revision 0x1c, date = 2014-07-03
[ 43.610225] microcode: CPU3 sig=0x306c3, pf=0x2, revision=0x1a
[ 43.611081] microcode: CPU3 updated to revision 0x1c, date = 2014-07-03
[ 43.611507] traps: systemd[1] trap invalid opcode ip:7f844f84a7ab sp:7fff2ccf7e28 error:0 in libpthread-2.19.so[7f844f839000+18000]
[...]

On Wed, 17 Sep 2014, Felix Geyer wrote:
> Please revert the 2.20140913.1ubuntu1 upload.
>
> After loading the new microcode lots of processes die with
> [ 43.611507] traps: systemd[1] trap invalid opcode ip:7f844f84a7ab sp:7fff2ccf7e28 error:0 in libpthread-2.19.so[7f844f839000+18000]
> [ 44.201798] traps: dbus-daemon[1277] trap invalid opcode ip:7f848d2f67ab sp:7fff8f3c3bb8 error:0 in libpthread-2.19.so[7f848d2e5000+18000]
> [ 44.202077] traps: systemd-logind[1287] trap invalid opcode ip:7f8c4bf887ab sp:7fff58f13178 error:0 in libpthread-2.19.so[7f8c4bf77000+18000
> [ 44.202101] traps: thermald[1269] trap invalid opcode ip:7fdf60f0c7ab sp:7fffa55068e8 error:0 in libpthread-2.19.so[7fdf60efb000+18000]
>
> It's fine after a reboot so it seems to be a problem with upgrading from
> the 2.20140624.1ubuntu1 microcode to 2.20140913.1ubuntu1.

This is bad. And it is a first, too.

IF you can test this safely, could you check what happens when this new
microcode update is applied during boot?

There are two scenarios: early initramfs (available for kernels 3.10 and
above), and normal initramfs.

It would be really helpful to know what happens in both scenarios. You can
select which one will happen by:

WARNING: please have a backup initramfs image (with the previous microcode)
handy, so that you can tell grub to use it instead:
   cd /boot
   cp <working initramfs> <working initramfs>.safe

After that:

1. edit /etc/default/intel-microcode with a text editor, change the config
to IUCODE_TOOL_INITRAMFS=yes (to test regular initramfs), and
    IUCODE_TOOL_INITRAMFS=early (to test early initramfs).

2. regenerate the initramfs, with:
    update-initramfs -u

3. reboot

4. check boot logs for badness (trap invalid opcode).

5. change IUCODE_TOOL_INITRAMFS= to the other option, and do pass 2-4 again.

Thank you!

--
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

Chris J Arges (arges) on 2014-09-17
tags: added: verification-failed
removed: verification-needed
Tim Gardner (timg-tpi) on 2014-09-17
Changed in intel-microcode (Ubuntu Utopic):
status: Fix Released → In Progress
Felix Geyer (debfx) wrote :

Both "yes" and "early" work fine.
However when I set IUCODE_TOOL_INITRAMFS=no, update the initramfs, reboot and then reinstall intel-microcode I get the same trap invalid opcode.

I guess no processes that use libpthread run during early boot when the microcode update is applied.

early:
[ 0.000000] CPU0 microcode updated early to revision 0x1c, date = 2014-07-03
[ 0.083506] CPU1 microcode updated early to revision 0x1c, date = 2014-07-03
[ 0.097821] CPU2 microcode updated early to revision 0x1c, date = 2014-07-03
[ 0.112006] CPU3 microcode updated early to revision 0x1c, date = 2014-07-03
[ 0.547442] microcode: CPU0 sig=0x306c3, pf=0x2, revision=0x1c
[ 0.547513] microcode: CPU1 sig=0x306c3, pf=0x2, revision=0x1c
[ 0.547587] microcode: CPU2 sig=0x306c3, pf=0x2, revision=0x1c
[ 0.547661] microcode: CPU3 sig=0x306c3, pf=0x2, revision=0x1c

yes:
[ 0.557422] microcode: CPU0 sig=0x306c3, pf=0x2, revision=0x16
[ 0.557496] microcode: CPU1 sig=0x306c3, pf=0x2, revision=0x16
[ 0.557569] microcode: CPU2 sig=0x306c3, pf=0x2, revision=0x16
[ 0.557643] microcode: CPU3 sig=0x306c3, pf=0x2, revision=0x16
[ 2.725811] microcode: CPU0 sig=0x306c3, pf=0x2, revision=0x16
[ 2.727303] microcode: CPU0 updated to revision 0x1c, date = 2014-07-03
[ 2.727352] microcode: CPU1 sig=0x306c3, pf=0x2, revision=0x16
[ 2.728156] microcode: CPU1 updated to revision 0x1c, date = 2014-07-03
[ 2.728182] microcode: CPU2 sig=0x306c3, pf=0x2, revision=0x16
[ 2.728954] microcode: CPU2 updated to revision 0x1c, date = 2014-07-03
[ 2.728996] microcode: CPU3 sig=0x306c3, pf=0x2, revision=0x16
[ 2.729867] microcode: CPU3 updated to revision 0x1c, date = 2014-07-03

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package intel-microcode - 2.20140913.1ubuntu2

---------------
intel-microcode (2.20140913.1ubuntu2) utopic; urgency=medium

  * It appears microcode-20140913.dat introduced regressions(s)
    https://bugs.launchpad.net/intel/+bug/1370352/comments/3
    Deleted microcode-20140913.dat
    -LP: #1370352
 -- Tim Gardner <email address hidden> Wed, 17 Sep 2014 12:26:59 -0600

Changed in intel-microcode (Ubuntu Utopic):
status: In Progress → Fix Released

On Wed, 17 Sep 2014, Felix Geyer wrote:
> Both "yes" and "early" work fine.

Thank you. That helps immensely!

At least now we have an easy possible workaround: blacklist a few cpuids in
the package postinst from a runtime update, or drop runtime update entirely
and always require a reboot.

> However when I set IUCODE_TOOL_INITRAMFS=no, update the initramfs, reboot
> and then reinstall intel-microcode I get the same trap invalid opcode.
>
> I guess no processes that use libpthread run during early boot when the
> microcode update is applied.

Felix, if it wouldn't be too much trouble, maybe you could attach a copy of
/proc/cpuinfo with microcode 0x1c (the new one), and one with either
microcode 0x16 (no microcode update at all), or with microcode 0x1a (the
previous release of intel-microcode) to this bug report?

It would be necessary to reboot between the microcode changes in that case
and use the early initramfs mode, so that we can be sure the "flags" line in
/proc/cpuinfo will be correct (it is set at boot time, and will not be
updated by a microcode update). You can use the early initramfs update
mode, which is always safer.

It would be also nice to know what instructions are trapping, and get Intel
enginers to look at it. I will leave that for the Canonical Intel team, as
they can contact Intel directly: I don't have that kind of access to Intel
engineers...

For the record, we have these scenarios:

cpuid 0x306c3, platform flags 0x02:
Microcode 0x16 to 0x1c, early mode and initramfs: OK online: FAIL
Microcode 0x16 to 0x1a, early mode and initramfs: OK online: OK
Microcode 0x1a to 0x1c, early mode and initramfs: N/A online: FAIL

FAIL: processor generates spurious invalid opcode traps on libpthread.

--
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

Felix Geyer (debfx) wrote :

It looks like this is the microcode update that disables TSX where it is broken.
The hle flag is removed from cpuinfo flags (see attached cpuinfo files).

Felix Geyer (debfx) wrote :
Felix Geyer (debfx) wrote :

On Wed, 17 Sep 2014, Felix Geyer wrote:
> It looks like this is the microcode update that disables TSX where it is
> broken. The hle flag is removed from cpuinfo flags (see attached cpuinfo
> files).

I thought as much.

Also, let me guess: if you update in IUCODE_TOOL_INITRAMFS=yes, the "hle"
flag doesn't disappear.

It also explains the illegal opcode exceptions if libpthread-2.19 is using
Intel TSX for lock elision... which looks likely. Note that even if "hle"
was removed from the processor flags, it would still crash running
processes.

--
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

For your information:

Thread on LKML (Linux kernel), related to this problem:
https://lkml.org/lkml/2014/9/18/218

Debian bug, requesting that glibc add a blacklist to disable use of HLE and
RTM in libpthread to protect users running outdated processor microcode:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=762195

I am also going to modify Debian intel-microcode behavior in response to
this bug, but I will wait until we have the complete picture of how this
should be addressed at the various levels, first.

--
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

Mathew Hodson (mathew-hodson) wrote :

Removed from trusty-proposed.

Changed in intel-microcode (Ubuntu Trusty):
status: Fix Committed → In Progress

FYI: fixed upstream (in Debian) through the mandatory use of early microcode on all automated updates.

Tim Gardner (timg-tpi) on 2014-12-03
Changed in intel-microcode (Ubuntu):
assignee: Tim Gardner (timg-tpi) → Chris J Arges (arges)
Chris J Arges (arges) wrote :

bug 1370352 is in -proposed to address the glibc HLE/RTM blacklist in pthreads

Chris J Arges (arges) wrote :

Comment #16 should be bug 1398975 fwiw.

Tim Gardner (timg-tpi) wrote :

I'm inclined to leave this alone for Trusty.

Changed in intel-microcode (Ubuntu Trusty):
assignee: Tim Gardner (timg-tpi) → nobody
status: In Progress → Won't Fix
Changed in intel:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers