The basic semantics for choosing the NOP sequence were completely wrong. This has been fixed now.
The NOPL instruction is not supported by all i686 processors, the coded assumption was that they all did. This has been changed by the recent AMD patches linked to by Quentin Neill so that it is not assumed and it's specified as an extension where it is supported.
(NOPL is not standard i686, it was undocumented and has just been de facto supported since the Pentium Pro.)
The benefit of full i686 optimisation, which do have real performance implications, are things like bswap (useful in networking), cmpxchg/xadd (used in atomics) and cmov (useful in compiler generated code).
The correct solution is surely to ensure that when something is compiled for generic i686, NOPL is nowhere to be seen...
Once the AMD patches that fix the bad semantics for NOPL and i686 in binutils (GAS) have been applied, the build of eglibc should be changed to restore i686 optimisation.
The frustrating thing is that broken semantics for i686 have led to absurd patches such as the following being proposed as a workaround:
The point that gets missed there is that, abstractly, a NOP is meant to be exactly as it says on the tin, a No Operation.
It's meant to do nothing at all - for a predefined number of clock cycles.
A NOP is commonly used for timing purposes, that completely breaks that contract.
Again, NOPL is not standard i686 - it's an extension supported by the vast, vast majority of i686 class processors. If you're compiling generically for i686, you cannot assume that it is present.
Bad patches like the above to work around broken compilation cause more harm than good, plastering over the real problem.
This should really be revisited.
The basic semantics for choosing the NOP sequence were completely wrong. This has been fixed now.
The NOPL instruction is not supported by all i686 processors, the coded assumption was that they all did. This has been changed by the recent AMD patches linked to by Quentin Neill so that it is not assumed and it's specified as an extension where it is supported.
(NOPL is not standard i686, it was undocumented and has just been de facto supported since the Pentium Pro.)
The benefit of full i686 optimisation, which do have real performance implications, are things like bswap (useful in networking), cmpxchg/xadd (used in atomics) and cmov (useful in compiler generated code).
The correct solution is surely to ensure that when something is compiled for generic i686, NOPL is nowhere to be seen...
Once the AMD patches that fix the bad semantics for NOPL and i686 in binutils (GAS) have been applied, the build of eglibc should be changed to restore i686 optimisation.
The frustrating thing is that broken semantics for i686 have led to absurd patches such as the following being proposed as a workaround:
http:// groups. google. com/group/ linux.kernel/ browse_ thread/ thread/ c1ec68f5498236d c/617726bec3159 5ed?show_ docid=617726bec 31595ed
The point that gets missed there is that, abstractly, a NOP is meant to be exactly as it says on the tin, a No Operation.
It's meant to do nothing at all - for a predefined number of clock cycles.
A NOP is commonly used for timing purposes, that completely breaks that contract.
Again, NOPL is not standard i686 - it's an extension supported by the vast, vast majority of i686 class processors. If you're compiling generically for i686, you cannot assume that it is present.
Bad patches like the above to work around broken compilation cause more harm than good, plastering over the real problem.
I've also had a brief look at the kernel.
http:// git.kernel. org/?p= linux/kernel/ git/torvalds/ linux-2. 6.git;a= blob;f= arch/x86/ Makefile_ 32.cpu; hb=HEAD
… special cases the GX1 to -march=pentium-mmx and the LX to -march= geode,- march=pentium- mmx.
There is also a special case for the bug in binutils for !CONFIG_X86_P6_NOP where -mtune=generic32 is set.
And what about [x86: do not promote TM3x00/TM5x00 to i686-class] http:// git.kernel. org/?p= linux/kernel/ git/torvalds/ linux-2. 6.git;a= commitdiff; h=a7ef94e688918 6848573a10c5bdb 8271405f44de - that patch is based on the bad assumption that i686 includes NOPL when it does not.
I haven't looked thoroughly, there might be more places!
Cheers,
Nick