libc6 upgrade fails: illegal instruction

Bug #587186 reported by Martin-Éric Racine on 2010-05-29
112
This bug affects 13 people
Affects Status Importance Assigned to Milestone
binutils (Fedora)
Fix Released
Medium
binutils (Ubuntu)
High
Unassigned
Maverick
Undecided
Unassigned
Natty
High
Unassigned
eglibc (Ubuntu)
High
Unassigned
Maverick
Undecided
Unassigned
Natty
High
Unassigned
gcc-4.4 (Ubuntu)
High
Unassigned
Maverick
Undecided
Unassigned
Natty
High
Unassigned
gcc-4.5 (Ubuntu)
Medium
Unassigned
Maverick
Undecided
Unassigned
Natty
Medium
Unassigned
update-manager (Ubuntu)
Medium
Michael Vogt
Maverick
Undecided
Michael Vogt
Natty
Medium
Michael Vogt

Bug Description

Preparing to replace libc6 2.11.1-0ubuntu9 (using .../libc6_2.12~20100519-0ubuntu1_i386.deb) ...
Checking for services that may need to be restarted...
Checking init scripts...
Unpacking replacement libc6 ...
dpkg: warning: subprocess old post-removal script killed by signal (Illegal instruction)
dpkg - trying script from the new package instead ...
dpkg: error processing /var/cache/apt/archives/libc6_2.12~20100519-0ubuntu1_i386.deb (--unpack):
 subprocess new post-removal script killed by signal (Illegal instruction)
dpkg: error while cleaning up:
 subprocess installed pre-installation script killed by signal (Illegal instruction)

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: libc6 2.11.1-0ubuntu9
ProcVersionSignature: Ubuntu 2.6.32-10.14+bug396286v2-generic
Uname: Linux 2.6.32-10-generic i586
Architecture: i386
Date: Sat May 29 12:34:01 2010
ProcEnviron:
 LANGUAGE=fi_FI:fi:en_US:en
 PATH=(custom, user)
 LANG=fi_FI.UTF-8
 SHELL=/bin/bash
SourcePackage: eglibc

Martin-Éric Racine (q-funk) wrote :
Matthias Klose (doko) wrote :

which processor is this? anything older than i686 isn't supported in maverick

Changed in eglibc (Ubuntu):
status: New → Incomplete
Martin-Éric Racine (q-funk) wrote :

Mattias Klose, it would be a good idea to read the data attached to the bug. This is a Geode LX800, therefore i586 generic.

Changed in eglibc (Ubuntu):
status: Incomplete → New
Matthias Klose (doko) wrote :

> it would be a good idea to read the data attached to the bug.

there was none.

> This is a Geode LX800, therefore i586 generic.

ok, closing as won't fix.

Changed in eglibc (Ubuntu):
status: New → Won't Fix
Jeremy Visser (jeremy-visser) wrote :

If this is going to be the case, then producing a libc6-i586 package compiled for i586 processors wouldn’t be a bad idea.

Martin-Éric Racine (q-funk) wrote :

Mattias, please look again. There was a name line and it clearly said i586.

Matthias Klose (doko) wrote :

> There was a name line and it clearly said i586.

Matin-Eic, the line did state the machine, not the processor.

> producing a libc6-i586 package compiled for i586 processors

no, as discussed at UDS we are dropping support for anything older than i686 for maverick.

Martin-Éric Racine (q-funk) wrote :

Bear in mind that dropping support for i586 also means that Ubuntu cannot install on the OLPC and that most thin client hardware meant for for LTSP won't be able to run on Ubuntu either, given how the chipsets used in thin client hardware tends to be i586-compatible. As such, I honestly think that if Ubuntu is serious about dropping support for i586, then a MUCH louder and formal announcement should be made NOW, well ahead of time, so that OLPC and LTSP users have enough time to consider other distributions to switch to.

Simon Huerlimann (huerlisi) wrote :

Well, we're exactly one of those companies having chosen to use Ubuntu LTSP with i586 compatible hardware (PC-Engines ALIX boards). Mmh, good that Martin-Éric did blog about this, now at least I know that we can't upgrade anymore...

Matthias Klose (doko) wrote :

> now at least I know that we can't upgrade anymore...

lucid and all lucid point release are unaffected by this.

Mario Limonciello (superm1) wrote :

it would probably be worthwhile to have update manager try to detect this scenario so ppl w/ boxes like this are not allowed to dist-upgrade past lucid (and break their boxes)

Alan Bell (alanbell) wrote :

Ubuntu doesn't install on the OLPC anyway. It runs a very hacked Fedora kernel. I really wouldn't worry about the hordes of Ubuntu running OLPC users. There is a Debian build that does run and there is an old version of Ubuntu based on the stock kernel. It is devices such as the Viglen MPC-L and thin client boxes that will be more of a concern, however pretty much everything new and low powered is based on the Atom chipset, and everything next year will be ARM (perhaps). Lucid is LTS so will be supported to 2013 so these platforms will have a supported Ubuntu system for several years to come. LTS releases don't offer to dist-upgrade to anything but the next LTS, so something might need to be done in the um. . . Quivering Quail cycle to stop them upgrading to a broken state. If anyone would like to help me get Lucid on my OLPC then you would get a big hug.

Martin-Éric Racine (q-funk) wrote :

Actually, the proper way to refuse to upgrade would be to follow what Debian did when they bumped the minimal platform requirement on SPARC. IIRC there was a preinst maintainer script segment that checked the machine type and exited dpkg with an error. Doing this would be the only totally foolproof method, since it cannot be assumed that someone will be using update-manager to upgrade. Instead they could very well use apt-get, aptitude other other tools, so foolproofing at the preinst level is the only truly safe option.

72 comments hidden view all 125 comments

(In reply to comment #5)
> So the CPU isn't i686 compatible and thus isn't supported.

I opened a Fesco ticket to clarify which CPUs are supported by Fedora:
https://fedorahosted.org/fesco/ticket/387

71 comments hidden view all 125 comments
Colin Watson (cjwatson) wrote :

We should definitely have a preinst fragment here - it's only sane, and there's plenty of precedent for it.

Changed in eglibc (Ubuntu):
importance: Undecided → Medium
status: Won't Fix → Triaged
importance: Medium → High
Martin-Éric Racine (q-funk) wrote :

As discussed with the Ubuntu developers, it appears that this could be fixed by adding "-mtune=generic32" to the compiler defaults, to avoid generating code that includes the undocumented NOPL instruction. This would restore compatibility with at least some single-chip architectures such as recent Geode products whose instruction set is nearly 686-compatible.

Changed in gcc-4.4 (Ubuntu):
importance: Undecided → High
Changed in gcc-4.4 (Ubuntu):
status: New → Triaged

On 02.06.2010 18:21, Martin-Éric Racine wrote:
> As discussed with the Ubuntu developers, it appears that this could be
> fixed by adding "-mtune=generic32" to the compiler defaults,

that would be wrong. the correct fix is the one mentioned in
http://lkml.org/lkml/2008/9/8/296

Matthias Klose (doko) wrote :

fixed in 4.4.4-4ubuntu1

Changed in gcc-4.4 (Ubuntu):
status: Triaged → Fix Released
Martin-Éric Racine (q-funk) wrote :

Thanks for patching GCC. Anxiously awaiting for the rebuilt libc6 to test whether this does the trick.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eglibc - 2.12-0ubuntu3

---------------
eglibc (2.12-0ubuntu3) maverick; urgency=low

  * Merge with Debian (r4318, trunk).
  * Rebuild for i386. LP: #587186.
 -- Matthias Klose <email address hidden> Fri, 04 Jun 2010 14:32:19 +0200

Changed in eglibc (Ubuntu):
status: Triaged → Fix Released
Matthias Klose (doko) wrote :

fixed in gcc-4.5 4.5.0-5ubuntu1

Changed in gcc-4.5 (Ubuntu):
status: New → Fix Released
Martin-Éric Racine (q-funk) wrote :

I'm afraid that the patch selected by Mattias didn't fix it:

Preparing to replace libc6 2.11.1-0ubuntu9 (using .../libc6_2.12-0ubuntu3_i386.deb) ...
Checking for services that may need to be restarted...
Checking init scripts...
Unpacking replacement libc6 ...
dpkg: warning: subprocess old post-removal script killed by signal (Illegal instruction)
dpkg - trying script from the new package instead ...
dpkg: error processing /var/cache/apt/archives/libc6_2.12-0ubuntu3_i386.deb (--unpack):
 subprocess new post-removal script killed by signal (Illegal instruction)
dpkg: error while cleaning up:
 subprocess installed pre-installation script killed by signal (Illegal instruction)
Errors were encountered while processing:
 /var/cache/apt/archives/libc6_2.12-0ubuntu3_i386.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

Changed in eglibc (Ubuntu):
status: Fix Released → Triaged
Changed in gcc-4.4 (Ubuntu):
status: Fix Released → Triaged
Changed in gcc-4.5 (Ubuntu):
status: Fix Released → Triaged
Changed in gcc-4.5 (Ubuntu):
importance: Undecided → Medium
Changed in update-manager (Ubuntu):
importance: Undecided → Medium
Matthias Klose (doko) wrote :

Matin-Eic, did you verify that this is the same instruction?

Changed in eglibc (Ubuntu):
status: Triaged → Incomplete
Martin-Éric Racine (q-funk) wrote :

Can you please provide instructions for checking which instruction caused the error?

63 comments hidden view all 125 comments

There is no clarification needed by FESCo. The release was intended to support the Geode LX -- whether or not it is an "i686" (it isn't). Nothing in the F13 release specifications says anything about changing the set of supported CPUs from F12, which clearly supported the Geode LX (see Comment 1).

Further investigation in the released Fedora 13 i686 LiveCD reveals that NOPL instructions only occur in binaries produced from glibc, and in a couple of statically linked binaries (/sbin/sln and /sbin/mdadm.static) linked with glibc.

NOPL instructions can be generated by the GNU Assembler for .align commands that occur in executable sections. GCC can generate .align when optimizing, to align loop starts on cache line boundaries. Gas only generates NOPL when instructed that the machine architecture supports it. Apparently, sometime between F12 and F13, either gas's .align support was improved or enbugged, or the arguments to gas when compiling glibc were changed to an inappropriate value that excluded the Geode LX.

This is not a generic problem with every package in the release -- it's a bug in the build configuration or Makefiles of glibc. This should be fixable by rebuilding the binary packages derived from glibc. This won't fix the release CDs and DVDs, but will fix the repos, which is key for the largest affected Fedora customer, OLPC. However, this problem HAS been reported by other users on other Geode LX hardware; see:

  http://sharkcz.livejournal.com/5708.html

This bug was timely reported, but the report was ignored while there was plenty of time to fix the release.

This bug should be reopened -- but bugzilla.redhat.com does not seem to offer me a user interface with which to do that.

The priority of this bug should be increased, because it causes the release media to fail to boot and fail to install on supported hardware. For the same reason, it should also be documented in the release notes here:

  http://fedoraproject.org/wiki/Common_F13_bugs#Hardware-related_issues

It's true that Geode LX is not-quite i686 and we seem to be aware of some other places in which it is lacking (in addition to nopl).

This time around it does appear that the only problematic package we know about is glibc, so I'll focus on that case.

The reason that glibc is the only affected package is because glibc's build system checks if the assembler supports the "-mtune" option, and if so, when building for i686 it passes -mtune=i686.

Normally gcc does not pass -mtune or -march to the assembler (surprised me, too). But glibc explicitly checks if the assembler accepts it, and if so, it passes -Wa,-mtune=i686 to gcc, causing gcc to pass -mtune=i686 to the assembler. (the assembler's -march option remains unused)

Passing -mtune=i686 to the assembler causes nopl to be used. This indicates an assembler bug, because mtune should produce enhanced code for a certain arch without breaking compat, but in this case using nopl instructions *will* break compat. This is http://sourceware.org/bugzilla/show_bug.cgi?id=6957 (looks like the developer wants a test case before continuing, I'll try and come up with one now)

Alright. Posted some test cases on the upstream bug and already got a response.

Invoking "as -mtune=i686" is effectively equivalent to invoking "as -march=i686 -mtune=i686" (a small behavioural difference when compared to gcc). So it will look to produce optimized code for i686, with the whole of the i686 instruction set at its disposition.

So there is no glibc or assembler bug here. We're asking glibc to build for i686, and glibc is doing a better job than most packages of communicating the target architecture through the whole build chain. (by default the optimization flags do not reach the assembler, but glibc makes this happen)

If we want to fix F13, we have a number of options available:
 - build glibc for target arch i586
 - hack glibc build system to use -Wa,-mtune=i586 (i.e. we ask all parts of the build chain to optimize for i686, except for the assembler which we ask to optimize for i586)
 - hack glibc build system to use -Wa,-mtune=i686 -Wa,-march=i586 (i.e. ask the assembler to optimize for i686, but never going outside of the instruction set supported by i586 processes)
 - hack glibc build system so that we don't ask assembler to do any optimizations (like the rest of the world)

In the FESCO meeting, we decided that we should gain a full understanding of this issue before coming up with a decision (do we hack glibc in F13 to fix this? Do we change target arch for F14? or both? or...?). I think with my analysis we have now gained an understanding.

And there is also the NOPL emulator for kernel, that seems to be simple enough. I would use this (the Geode LX is very close to i686) and do iX86 < i686 as secondary arch where other people would profit too.

The NOPL emulator or letting Geode LX use a secondary i586 (or i486) arch is the only reasonable way to resolve this. All packages are built with -march=i686, so
even if as -mtune=i686 perhaps shouldn't use automatically i686 insns, -march=i686 certainly should. gcc should switch to telling as to use i686 optimized code for -march=i686, so all packages will eventually have i686 nops.

(In reply to comment #11)
> If we want to fix F13, we have a number of options available:
> - hack glibc build system to use -Wa,-mtune=i686 -Wa,-march=i586 (i.e. ask the
> assembler to optimize for i686, but never going outside of the instruction set
> supported by i586 processes)

$ echo 'cmove %eax,%edx' | as -32 -march=i586 -mtune=i686
{standard input}: Assembler messages:
{standard input}:1: Error: `cmove' is not supported on `i586'

Download full text (3.6 KiB)

Having somebody else fix the bug in some other software component always "seems to be simple enough". In this case, it isn't; there is serious dissent in the kernel team about whether, and how, to make this "simple" instruction emulation work. Independent of the kernel-architecture disagreement, nobody has yet posted a correct patch for "simply" emulating it in the kernel (the best patch opens security holes by peeking at userspace without appropriate permission checks). And if NOPL is going to become common in every program, as Jakub suggested, the last thing we want is for it to take thousands of cycles because it has to trap into the kernel every time it wants to do a no-op!

Perhaps another way to look at the problem is that gcc and gas do not provide a way to target the instruction set of the Geode LX processor. Fedora definitely wants to target these processors, because they are a significant part of the installed base of Fedora - 1.5 million machines to date. (Here's the Fedora Feature notice for the change in F11 that made i586 support the base: http://fedoraproject.org/wiki/Features/ArchitectureSupport . And here's the F12 change that was supposed to support both the i686 AND the Geode LX while desupporting the i586: http://fedoraproject.org/wiki/Features/F12X86Support )

Yet there are no compiler/gas switches that let the full instruction set of the Geode LX processor be used without also including instructions that the processor doesn't implement. This may be a result of the unfortunate Intel-Corporation orientation of the instruction set (in general they like to ignore their competition). Intel did not document the NOPL instruction for the i686, yet all their 686 processors happened to implement it, so it crept into the tools even though it was not part of the documented spec. But other vendors who merely implemented the spec did not implement it. Here is the 1997 "Intel Architecture Software Developer's Manual: Volume 2: Instruction Set Reference" from the Pentium Pro era, as retrieved by the Internet Archive: http://web.archive.org/web/20070221130324/http://developer.intel.com/design/pentium/manuals/24319101.pdf . It does not document NOPL, only the 1-byte NOP instruction. Contrast this with the current "Intel 64 and IA-32 Architectures Software Developers Manual, Volume 2", which does document long NOP instructions: http://www.intel.com/Assets/PDF/manual/253667.pdf .

So one possible resolution is to keep the "i686" designation for this instruction set, but to remove NOPL from the "i686" instruction set, making the GNU tools match the i686 (Pentium Pro and successors) documentation.

What is the actual performance impact of altering GAS to avoid the use of NOPL on the i686 while aligning in executable sections? Since GAS already knows how to align on every kind of processor (it picks from among four or five different alternative NOP selections already), patching gas's NOP selection on i686 seems to be the least intrusive change that would result in long-term support for both the i686 and the Geode LX.

A more intrusive change would be to define an instruction set "geodelx" and for Fedora to supply -march=geodelx -...

Read more...

Comment 15: Well argued.

Another way to look at this is that 32 bit x86 is already a legacy
architecture. No one who cares about performance should be using
it -- all those power users long switched to x86-64. If you accept
this argument then what we in Fedora should be doing is not making
much effort to optimize 32 bit legacy systems at all. Rather, we should
work on broadening support to as many old systems as possible by
compiling for the lowest common denominator (i486 or i586 systems
were probably the oldest ones that would have had enough memory to
run mini Fedora spins).

I don't think the 36.3 million people who bought Atom netbooks in 2009 think their system is "legacy" -- nor the 58 million expected to buy in 2010 (estimated May 2010: http://www.abiresearch.com/press/1656-2009+Netbook+Shipments+Pass+Expectations%2C+58+Million+Forecast+for+2010). All CPU vendors are still selling high and growing volumes of 32-bit-only x86 chips. Their users do care about performance, though the issue is much more about performance-per-watt than about raw performance. X86 chips without amd64 support are mostly used either in mobile applications where battery life matters, or in embedded boards with limited heat budgets (or fanless operation).

Given the higher penetration of Linux in netbooks (and OLPCs) than on desktops, there may well be more people running Fedora on x86 than on amd64. Indeed that's exactly what smolt shows: http://smolt.fedoraproject.org/static/stats/stats.html, with 69% on x86 and 30% on 64-bit, out of about 203,000 reporting hosts. (Smolt never runs on 1.5 million Fedora OLPCs that are Geode LX's. XO-1.5's will replace XO-1 sales this year; they use the Via C7-M ULV processor which is also only 32-bit x86 compatible, and all of them will run Fedora.)

Unfortunately the web and repo stats aren't broken out by architecture: http://fedoraproject.org/wiki/Statistics . Perhaps this page can be improved to tell us how many accesses are happening to x86 versus x64 repos?

The overall picture is clear: 32-bit x86 is still the dominant architecture in use in Fedora. We shouldn't break high volume 32-bit chips in Fedora, nor should we stop optimizing for 32-bit systems.

*** Bug 594660 has been marked as a duplicate of this bug. ***

71 comments hidden view all 125 comments
Martin-Éric Racine (q-funk) wrote :

As found by Fedora, the real issue is with GAS: https://bugzilla.redhat.com/show_bug.cgi?id=579838

Changed in binutils (Ubuntu):
importance: Undecided → High
Matthias Klose (doko) on 2010-06-14
Changed in binutils (Ubuntu):
status: New → Invalid
Changed in gcc-4.4 (Ubuntu):
status: Triaged → Invalid
Changed in gcc-4.5 (Ubuntu):
status: Triaged → Invalid
Martin-Éric Racine (q-funk) wrote :

Matthias, sorry, but in what way is this bug suddenly invalid?

Matthias Klose (doko) wrote :

the bug isn't closed, just kept the eglibc task open

Martin-Éric Racine (q-funk) wrote :

Noted. Is the bug still incomplete, then? If yes, can you please provide me with instructions on how to get a trace on a package being unpacked, in a situation when the package happens to be libc6, which affects the operation of everything else on top?

Matthias Klose (doko) wrote :

well, somehwat ;) there was a request to provide a list of packages which are required from your point of view to be runnable on this platform. Please could you open a separate bug report for this and reference it here?

Martin-Éric Racine (q-funk) wrote :

That's completely unrelated to this bug and besides the point. You had asked me to track down which instruction causes the illegal error and I asked for instructions on how to do that. I'm asking if, given the existing information on the FC bug, this still needs testing and, if yes, how.

Matthias Klose (doko) wrote :

On 14.06.2010 17:08, Martin-Éric Racine wrote:
> That's completely unrelated to this bug and besides the point.

No. we have more than option to resolve this issue. One of them is to ignore
the request and provide a PPA with packages explicitly optimized for i586 for a
project.

Changed in eglibc (Ubuntu):
status: Incomplete → Confirmed
Martin-Éric Racine (q-funk) wrote :

Then my answer has to be "everything" since I'm running a Geode host with a normal hard-disk and packages that occasionally get installed or removed as needed.

65 comments hidden view all 125 comments

Created attachment 425520
Disable assembler optimizations

in the FESCO meeting it was decided to patch glibc for F13. Here is an appropriate patch.

The response for F14+ is not yet clear, but will be continue to be discussed in tuesdays meeting.

https://fedorahosted.org/fesco/ticket/387

64 comments hidden view all 125 comments
Martin-Éric Racine (q-funk) wrote :

GCC seems to offer -march=geode and -mtune=geode since GCC 4.3, so I'm wondering if using these in combination with i686 optimization might accomplish what we need?

65 comments hidden view all 125 comments

*** Bug 607186 has been marked as a duplicate of this bug. ***

64 comments hidden view all 125 comments
Martin-Éric Racine (q-funk) wrote :

The "upstream" Fedora bug just saw an attachment added a few days ago. What it does is disable the -mtune options for i686 in libc6 build scripts.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eglibc - 2.12-0ubuntu4

---------------
eglibc (2.12-0ubuntu4) maverick; urgency=low

  * Update to the eglibc 2.12 branch (r10817).
    - patches/any/cvs-flush-cache-textrels.diff: Remove.
    - patches/any/cvs-redirect-throw.diff: Remove.
  * Merge with Debian (r4360, trunk, 2.11.2-2).
  * On i386, don't build with -Wa,-mtune=i686. LP: #587186.
 -- Matthias Klose <email address hidden> Mon, 28 Jun 2010 00:47:05 +0200

Changed in eglibc (Ubuntu):
status: Confirmed → Fix Released
Martin-Éric Racine (q-funk) wrote :

I confirm that eglibc 2.12-0ubuntu4 apparently fixes it. There is no more "Illegal instruction" error during upgrade.

Matthias Klose (doko) wrote :

On 28.06.2010 06:21, Martin-Éric Racine wrote:
> I confirm that eglibc 2.12-0ubuntu4 apparently fixes it. There is no
> more "Illegal instruction" error during upgrade.

just to note that while we may have this patch in maverick, it's no guarantee to
have it in later releases as well.

Martin-Éric Racine (q-funk) wrote :

Why wouldn't it be included?

61 comments hidden view all 125 comments

Not to be impatient, but is there an estimate of when we might see a patched glibc for F13 hitting the testing repo?

glibc-2.12-3 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/glibc-2.12-3

glibc-2.12-3 has been pushed to the Fedora 13 testing repository. If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with
 su -c 'yum --enablerepo=updates-testing update glibc'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/glibc-2.12-3

(In reply to comment #16)
> Comment 15: Well argued.
>
> Another way to look at this is that 32 bit x86 is already a legacy
> architecture. No one who cares about performance should be using
> it -- all those power users long switched to x86-64. If you accept
> this argument then what we in Fedora should be doing is not making
> much effort to optimize 32 bit legacy systems at all. Rather, we should
> work on broadening support to as many old systems as possible by
> compiling for the lowest common denominator (i486 or i586 systems
> were probably the oldest ones that would have had enough memory to
> run mini Fedora spins).

hear, hear. I was proud of Linux in the "old days" when it truely supported old hardware. With my Via C3 and C3-2, I have to look at other solutions, perhaps even NetBSD.

I mean this will still support the Atom netbooks. I really dont think "the up to one percentage" gained by going from i586 to i686 was a good choice. I also have a Atom netbook.

Doesn't most newer Intel Atoms have the 64 bit extension?

Soekris 5501 with Geode LX processor boots and runs OK after installing this test build.

glibc-2.12-3 has been pushed to the Fedora 13 stable repository. If problems still persist, please make note of it in this bug report.

*** Bug 618592 has been marked as a duplicate of this bug. ***

I've just started testing this on my Fit-PC1 which has a geode (I would have tested it sooner but its been packed in boxes moving since March and i've only just recovered it. The initial upgrade of "yum upgrade yum rpm glibc" from F-12 has worked and its currently around 50% of the way through the rest of the yum upgrade for the rest of the distro. I'll be doing further testing over the coming days. Looks good so far.

I can verify that doing a rebuild of the Fedora13 livecd with the livecd-creator results in an image that's bootable on my Via C3-2 hardware.

68 comments hidden view all 125 comments
Matthias Klose (doko) wrote :

now works with the Geode-LX. No need to fix it in the update-manager

Changed in update-manager (Ubuntu):
status: New → Won't Fix
69 comments hidden view all 125 comments

FYI the Linux binutils 2.20.51.0.11 release contains a fix that no longer generates the offending NOPs for i686.

2010/08/06 patch
   http://sourceware.org/ml/binutils-cvs/2010-08/msg00057.html
2010/08/11 release
   http://gcc.gnu.org/ml/gcc/2010-08/msg00194.html

68 comments hidden view all 125 comments
Nick Lowe (n-lowe) wrote :

Is it worth revisting this as it can be optimised for i686 now and not fail?

https://bugzilla.redhat.com/show_bug.cgi?id=579838xx

Quentin Neill 2010-09-03 00:38:02 EDT
"FYI the Linux binutils 2.20.51.0.11 release contains a fix that no longer
generates the offending NOPs for i686.

2010/08/06 patch
   http://sourceware.org/ml/binutils-cvs/2010-08/msg00057.html
2010/08/11 release
   http://gcc.gnu.org/ml/gcc/2010-08/msg00194.html"

Martin-Éric Racine (q-funk) wrote :

Nick: thank you for this information.

Yes, it would be worth including those fixes into Maverick and trying to see if rebuilding the toolchain, then libc6 and the 686 kernel would provide something that remains usable on a Geode LX and, hopefully, also on a Geode GX2 (which, for marketing reasons, AMD calls a GX, even though it's a second-generation Geode whose design was bough as-is from NSC).

68 comments hidden view all 125 comments

Is there any formal FC13 CD/DVD I can download to install on geode? I need a full installtion that is likely Fedora-13-i386-DVD.iso?

Nick Lowe (nick-int-r) on 2010-09-07
Changed in eglibc (Ubuntu):
status: Fix Released → Incomplete
Matthias Klose (doko) on 2010-09-07
Changed in eglibc (Ubuntu):
status: Incomplete → Fix Released
Changed in eglibc (Ubuntu):
status: Fix Released → Incomplete
Matthias Klose (doko) on 2010-09-07
Changed in eglibc (Ubuntu):
status: Incomplete → Fix Released

This should really be revisited.

The basic semantics for choosing the NOP sequence were completely wrong. This has been fixed now.
The NOPL instruction is not supported by all i686 processors, the coded assumption was that they all did. This has been changed by the recent AMD patches linked to by Quentin Neill so that it is not assumed and it's specified as an extension where it is supported.

(NOPL is not standard i686, it was undocumented and has just been de facto supported since the Pentium Pro.)

The benefit of full i686 optimisation, which do have real performance implications, are things like bswap (useful in networking), cmpxchg/xadd (used in atomics) and cmov (useful in compiler generated code).

The correct solution is surely to ensure that when something is compiled for generic i686, NOPL is nowhere to be seen...

Once the AMD patches that fix the bad semantics for NOPL and i686 in binutils (GAS) have been applied, the build of eglibc should be changed to restore i686 optimisation.

The frustrating thing is that broken semantics for i686 have led to absurd patches such as the following being proposed as a workaround:

http://groups.google.com/group/linux.kernel/browse_thread/thread/c1ec68f5498236dc/617726bec31595ed?show_docid=617726bec31595ed

The point that gets missed there is that, abstractly, a NOP is meant to be exactly as it says on the tin, a No Operation.

It's meant to do nothing at all - for a predefined number of clock cycles.
A NOP is commonly used for timing purposes, that completely breaks that contract.

Again, NOPL is not standard i686 - it's an extension supported by the vast, vast majority of i686 class processors. If you're compiling generically for i686, you cannot assume that it is present.

Bad patches like the above to work around broken compilation cause more harm than good, plastering over the real problem.

I've also had a brief look at the kernel.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=arch/x86/Makefile_32.cpu;hb=HEAD

… special cases the GX1 to -march=pentium-mmx and the LX to -march=geode,-march=pentium-mmx.

There is also a special case for the bug in binutils for !CONFIG_X86_P6_NOP where -mtune=generic32 is set.

And what about [x86: do not promote TM3x00/TM5x00 to i686-class] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a7ef94e6889186848573a10c5bdb8271405f44de - that patch is based on the bad assumption that i686 includes NOPL when it does not.

I haven't looked thoroughly, there might be more places!

Cheers,

Nick

Martin Pitt (pitti) on 2010-10-20
Changed in update-manager (Ubuntu):
status: Won't Fix → Fix Committed
Changed in binutils (Ubuntu Maverick):
status: New → Invalid
Changed in eglibc (Ubuntu Maverick):
status: New → Invalid
Changed in gcc-4.4 (Ubuntu Maverick):
status: New → Invalid
Changed in update-manager (Ubuntu Maverick):
status: New → Fix Committed
tags: added: verification-needed
Changed in gcc-4.5 (Ubuntu Maverick):
status: New → Invalid
tags: added: verification-failed
removed: verification-needed
Changed in update-manager (Ubuntu Maverick):
assignee: nobody → Michael Vogt (mvo)
status: Fix Committed → In Progress
Steve Langasek (vorlon) on 2010-11-03
tags: removed: verification-failed
Martin Pitt (pitti) on 2010-11-12
Changed in update-manager (Ubuntu Maverick):
status: In Progress → Fix Committed
tags: added: verification-needed
Changed in update-manager (Ubuntu Maverick):
status: Fix Committed → Fix Released

The most recent glibc.i686 build for Rawhide ( http://koji.fedoraproject.org/koji/buildinfo?buildID=215507 ) appears to have reintroduced the NOPL instruction. See http://lists.fedoraproject.org/pipermail/test/2011-February/096805.html .

Re-opening and adding as a F15 Alpha blocker

Is there a risk that this new binutils version will spit NOPLs into a whole range of packages, not just glibc?

If so this should ideally be given attention this week, as the F15 rebuild is due.

I had a quick look at the changelog and NEWS files of the newest releases and nothing stood out. I took a look at the patches that fixed this originally against the current code, but I don't have the background knowledge to make any meaningful observations.

There is now:

* Mon Feb 7 2011 Jan Kratochvil <email address hidden> - 2.13.90-2
- Put back the assembler "workaround" - to disable the nopl instruction.

which just reverts:
* Tue Jan 25 2011 Andreas Schwab <email address hidden> - 2.13.90-1
[...]
- Remove no longer needed assembler workaround

so that nopl/nopw are no longer generated on i686.

Great. Thanks so much for addressing this at short notice :)

http://sourceware.org/ml/binutils/2011-02/msg00071.html
->
I believe glibc should therefore use -march=something, probably -march=i686.

Leaving notes on proposed blockers as I won't be at the meeting tomorrow most likely:

This hits "The installer must boot (if appropriate) and run on all primary architectures from default live image, DVD, and boot.iso install media", IMO, so is an Alpha blocker. +1

fixed is in glibc-2.13.90-2

AGREED: 579838 - accepted as Alpha blocker as this impacts all XO's and is already fixed.

This is *not* a glibc bug.

(In reply to comment #42)
> This is *not* a glibc bug.

Andreas, can you be more specific? I'd like to get as many details as possible on this, so we can get it assigned to the proper component.

(In reply to comment #43)
> (In reply to comment #42)
> > This is *not* a glibc bug.
>
> Andreas, can you be more specific? I'd like to get as many details as possible
> on this, so we can get it assigned to the proper component.

glibc should not contain NOPL instructions when compiled with -march=i686 against a recent binutils (GAS) version. A workaround in glibc should not be needed.

Refer to my previous comment.

Cheers,

Nick

@nickc - Any thoughts on how to proceed. This is currently listed as a F15Alpha release blocker. THe Alpha release candidate compose is scheduled for this friday (02-18). For this bug, we'll need either a workaround, a fix, or proof it does not qualify as an F15Alpha release blocker. Thanks!

With the workaround reapplied, it is clearly not a blocker. The fix is merely applied in the wrong place. It is just bandaid, if you like.

The longer term view should be to correct things so that the optimal compilation options are used.

Now that binutils has been fixed, i686 is a sane baseline for a 32-bit build.
In my view, the entire 32-bit build of FC15 should be targeted for i686 with -march=i686 with a fixed version of binutils (GAS) that does not make bad assumption over the presence of NOPL.

As far as compatibility impact of compiling for i686, which need to be understood and digested:

VIA Edens based on the 'Samuel 2' design do not support CMOV or NOPL. (These would break.)
All VIA Edens based on the 'Nehemiah' design support CMOV but not NOPL. (Introduced in 2003. These would not break.)

Via C3s based on the 'Samuel 2'or 'Ezra'/'Ezra-T' design do not support CMOV or NOPL. (These would break.)
All C3s based on the 'Nehemiah' design support CMOV but not NOPL. (Introduced in 2003. These would not break.)

National Semi's GXm, GXLV and GX1 do not support CMOV or NOPL. (These would break.)
All Geodes since and including National Semi's GX2 support CMOV but not NOPL. (Introduced in 2002. These would not break.)
The AMD branded Geodes (GX and LX) support CMOV but not NOPL. (These would not break.)

The Cyrix 6x86 processors do not support CMOV or NOPL. (These would break.)
The Cyrix 8x86MX / Cyrix MII do support CMOV but not NOPL. (These would not break.)

The AMD K6 and K6-2 do not support CMOV or NOPL. (These would break.)

This bug is for NOPL instructions existing in glibc. That bug is clearly now fixed, hence this should be closed. Nick, if you think that working around the issue in glibc is the wrong fix, can you please file a new bug against whichever component should be fixed so that glibc does not need to include a workaround (I'm guessing binutils)? It would be easier to track the correct issues that way; we should close this one so it's no longer blocking the Alpha release. Thanks.

Michael Vogt (mvo) on 2011-04-12
Changed in update-manager (Ubuntu Natty):
assignee: nobody → Michael Vogt (mvo)
status: Fix Committed → Fix Released
Changed in binutils (Fedora):
importance: Unknown → Medium
status: Unknown → Fix Released
Displaying first 40 and last 40 comments. View all 125 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.