Valgrind fails on most apps on amd64

Bug #148465 reported by Thomas Zander on 2007-10-03
20
Affects Status Importance Assigned to Milestone
Valgrind
Fix Released
Medium
valgrind (Ubuntu)
Medium
Alexander Sack

Bug Description

A library I use seems to trigger this problem 100% accurately;

vex amd64->IR: unhandled instruction bytes: 0x66 0x66 0x66 0x66
valgrind: Unrecognised instruction at address 0x4016321.
Your program just tried to execute an instruction that Valgrind
did not recognise.

I understand from a colleague that this has already been fixed in valgrind upstream; would be great to get this fix shipped in gutsy.

Related branches

Hi,

valgrind x86_64 currently stumbles over this instruction:

66 66 66 66 2e 0f 1f nopw %cs:0x0(%rax,%rax,1)

I think I have the same problem?

valgrind -v ./tst_GMAA_FSPC.debug
==1977== Memcheck, a memory error detector.
==1977== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==1977== Using LibVEX rev 1732, a library for dynamic binary translation.
==1977== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==1977== Using valgrind-3.2.3-Debian, a dynamic binary instrumentation framework.
==1977== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==1977==
--1977-- Command line
--1977-- ./tst_GMAA_FSPC.debug
--1977-- Startup, with flags:
--1977-- --suppressions=/usr/lib/valgrind/debian-libc6-dbg.supp
--1977-- -v
--1977-- Contents of /proc/version:
--1977-- Linux version 2.6.18-4-amd64 (Debian 2.6.18.dfsg.1-12etch2) (<email address hidden>) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Fri May 4 00:37:33 UTC 2007
--1977-- Arch and hwcaps: AMD64, amd64-sse2
--1977-- Page sizes: currently 4096, max supported 4096
--1977-- Valgrind library directory: /usr/lib/valgrind
--1977-- Reading syms from /home/faolieho/Documents/implementation/madp/trunk/src/tests/tst_GMAA_FSPC.debug (0x400000)
--1977-- Reading syms from /lib/ld-2.6.1.so (0x4000000)
--1977-- Reading debug info from /lib/ld-2.6.1.so...
--1977-- ... CRC mismatch (computed 635CD41D wanted 1F3B7BF3)
--1977-- object doesn't have a symbol table
--1977-- Reading syms from /usr/lib/valgrind/amd64-linux/memcheck (0x38000000)
--1977-- object doesn't have a dynamic symbol table
--1977-- Reading suppressions file: /usr/lib/valgrind/debian-libc6-dbg.supp
--1977-- Reading suppressions file: /usr/lib/valgrind/default.supp
vex amd64->IR: unhandled instruction bytes: 0x66 0x66 0x66 0x66
==1977== valgrind: Unrecognised instruction at address 0x4016321.
==1977== Your program just tried to execute an instruction that Valgrind
==1977== did not recognise. There are two possible reasons for this.
==1977== 1. Your program has a bug and erroneously jumped to a non-code
==1977== location. If you are running Memcheck and you just saw a
==1977== warning about a bad jump, it's probably your program's fault.
==1977== 2. The instruction is legitimate but Valgrind doesn't handle it,
==1977== i.e. it's Valgrind's fault. If you think this is the case or
==1977== you are not sure, please let us know and we'll try to fix it.
==1977== Either way, Valgrind will now raise a SIGILL signal which will
==1977== probably kill your program.
==1977==
==1977== Process terminating with default action of signal 4 (SIGILL)
==1977== Illegal opcode at address 0x4016321
==1977== at 0x4016321: (within /lib/ld-2.6.1.so)
==1977== by 0x4007CC2: (within /lib/ld-2.6.1.so)
==1977== by 0x4003329: (within /lib/ld-2.6.1.so)
==1977== by 0x4014457: (within /lib/ld-2.6.1.so)
==1977== by 0x400230A: (within /lib/ld-2.6.1.so)
==1977== by 0x4000A67: (within /lib/ld-2.6.1.so)
==1977==

*** This bug has been confirmed by popular vote. ***

Am looking at this now, but a bit confused because I can't reproduce the
failure on svn trunk or the 3.2 branch. Maybe 66 66 66 66 2e 0f 1f is
only the initial part of the instruction. Could one of you please send
the complete objdump -d output for the instruction so I can see what all
the instruction bytes are?

Sure:

derick@kossu:~$ objdump -d /lib/ld-2.6.1.so | grep "66 66 66 66"
     c13: 66 66 66 66 2e 0f 1f nopw %cs:0x0(%rax,%rax,1)
    13e3: 66 66 66 66 2e 0f 1f nopw %cs:0x0(%rax,%rax,1)
    55a1: 66 66 66 66 66 66 2e nopw %cs:0x0(%rax,%rax,1)
    8ed1: 66 66 66 66 66 66 2e nopw %cs:0x0(%rax,%rax,1)
    9aa3: 66 66 66 66 2e 0f 1f nopw %cs:0x0(%rax,%rax,1)
    d171: 66 66 66 66 66 66 2e nopw %cs:0x0(%rax,%rax,1)
    e191: 66 66 66 66 66 66 2e nopw %cs:0x0(%rax,%rax,1)
    e611: 66 66 66 66 66 66 2e nopw %cs:0x0(%rax,%rax,1)
    ede3: 66 66 66 66 2e 0f 1f nopw %cs:0x0(%rax,%rax,1)
    f752: 66 66 66 66 66 2e 0f nopw %cs:0x0(%rax,%rax,1)
   106f3: 66 66 66 66 2e 0f 1f nopw %cs:0x0(%rax,%rax,1)
   107d1: 66 66 66 66 66 66 2e nopw %cs:0x0(%rax,%rax,1)
   10ae1: 66 66 66 66 66 66 2e nopw %cs:0x0(%rax,%rax,1)
   118a1: 66 66 66 66 66 66 2e nopw %cs:0x0(%rax,%rax,1)
   13bc2: 66 66 66 66 66 2e 0f nopw %cs:0x0(%rax,%rax,1)
   148e2: 66 66 66 66 66 2e 0f nopw %cs:0x0(%rax,%rax,1)
   14961: 66 66 66 66 66 66 2e nopw %cs:0x0(%rax,%rax,1)
   15111: 66 66 66 66 66 66 2e nopw %cs:0x0(%rax,%rax,1)
   156c1: 66 66 66 66 66 66 2e nopw %cs:0x0(%rax,%rax,1)
   15f93: 66 66 66 66 2e 0f 1f nopw %cs:0x0(%rax,%rax,1)
   160e1: 66 66 66 66 66 66 2e nopw %cs:0x0(%rax,%rax,1)
   16321: 66 66 66 66 66 66 2e nopw %cs:0x0(%rax,%rax,1)
   16d12: 66 66 66 66 66 2e 0f nopw %cs:0x0(%rax,%rax,1)

Full dump is here:
http://files.derickrethans.nl/ld.dump.txt

whoopsie, indeed. the complete context is:

     ab7: c3 retq
     ab8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
     abf: 00
     ac0: 83 47 04 01 addl $0x1,0x4(%rdi)
     ac4: c3 retq
     ac5: 66 66 2e 0f 1f 84 00 nopw %cs:0x0(%rax,%rax,1)
     acc: 00 00 00 00
     ad0: 83 6f 04 01 subl $0x1,0x4(%rdi)
     ad4: c3 retq
     ad5: 66 66 2e 0f 1f 84 00 nopw %cs:0x0(%rax,%rax,1)

Um, ok. I still can't reproduce it using the program before on amd64.
What am I doing wrong?

int main ( void )
{
  __asm__ __volatile__(
     ".byte 0x66\n\t"
     ".byte 0x66\n\t"
     ".byte 0x66\n\t"
     ".byte 0x66\n\t"
     ".byte 0x2e\n\t"
     ".byte 0x0f\n\t"
     ".byte 0x1f\n\t"
     ".byte 0x84\n\t"
     ".byte 0x00\n\t"
     ".byte 0x00\n\t"
     ".byte 0x00\n\t"
     ".byte 0x00\n\t"
     ".byte 0x00\n"
  );
  return 0;
}

I even get it with an empty executable:

int main ( void )
{
  return 0;
}

$ gcc -static -o prog-test prog.c

$ valgrind ./prog-test
==25513== Memcheck, a memory error detector.
==25513== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==25513== Using LibVEX rev 1732, a library for dynamic binary translation.
==25513== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==25513== Using valgrind-3.2.3-Debian, a dynamic binary instrumentation framework.
==25513== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==25513== For more details, rerun with: -v
==25513==
vex amd64->IR: unhandled instruction bytes: 0x66 0x66 0x66 0x66
==25513== valgrind: Unrecognised instruction at address 0x451C22.
==25513== Your program just tried to execute an instruction that Valgrind
==25513== did not recognise. There are two possible reasons for this.
==25513== 1. Your program has a bug and erroneously jumped to a non-code
==25513== location. If you are running Memcheck and you just saw a
==25513== warning about a bad jump, it's probably your program's fault.
==25513== 2. The instruction is legitimate but Valgrind doesn't handle it,
==25513== i.e. it's Valgrind's fault. If you think this is the case or
==25513== you are not sure, please let us know and we'll try to fix it.
==25513== Either way, Valgrind will now raise a SIGILL signal which will
==25513== probably kill your program.
==25513==
==25513== Process terminating with default action of signal 4 (SIGILL)
==25513== Illegal opcode at address 0x451C22
==25513== at 0x451C22: strpbrk (in /tmp/prog-test)
==25513== by 0x448229: strsep (in /tmp/prog-test)
==25513== by 0x42C8B0: fillin_rpath (in /tmp/prog-test)
==25513== by 0x42E6DB: _dl_init_paths (in /tmp/prog-test)
==25513== by 0x40AFBE: _dl_non_dynamic_init (in /tmp/prog-test)
==25513== by 0x40B6CA: __libc_init_first (in /tmp/prog-test)
==25513== by 0x400403: (below main) (in /tmp/prog-test)
==25513==
==25513== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==25513== malloc/free: in use at exit: 0 bytes in 0 blocks.
==25513== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
==25513== For counts of detected errors, rerun with: -v
==25513== All heap blocks were freed -- no leaks are possible.
Illegal instruction

$ valgrind --version
valgrind-3.2.3-Debian

I'll try fresh sources now, see if that helps. Anything else I could try ?

That didn't go very far:

configure: error: Valgrind requires glibc version 2.2 - 2.5

$ dpkg -p libc6
Package: libc6
Priority: required
Section: libs
Installed-Size: 11328
Maintainer: GNU Libc Maintainers <email address hidden>
Architecture: amd64
Source: glibc
Version: 2.6.1-1
Provides: glibc-2.6-1
Depends: libgcc1
Suggests: locales, glibc-doc
Conflicts: libterm-readline-gnu-perl (<< 1.15-2), tzdata (<< 2007e-2)
Size: 4911700
Description: GNU C Library: Shared libraries
 Contains the standard libraries that are used by nearly all programs on
 the system. This package includes shared versions of the standard C library
 and the standard math library, as well as many others.

Just to make sure, does everybody with this bug uses debian (testing/lenny) on AMD64, with libc 2.6.1-1 ?

> dpkg -s libc6
Package: libc6
Status: install ok installed
Priority: required
Section: libs
Installed-Size: 11328
Maintainer: GNU Libc Maintainers <email address hidden>
Architecture: amd64
Source: glibc
Version: 2.6.1-1
Provides: glibc-2.6-1
Depends: libgcc1
Suggests: locales, glibc-doc
...

> md5sum /lib/ld-2.6.1.so
f68b7e0311528195934658fa43a67cb6 /lib/ld-2.6.1.so

I also saw it on the suse list:

https://bugzilla.novell.com/show_bug.cgi?id=296803#c1

Which writes "Dirk Mueller told me that this was triggered by the new binutils which uses a new way of writing NOPs which is not yet known to valgrind."

What I need is for someone to construct a modified version of the
program I posted, which does cause Valgrind to bomb when it runs
the __asm__ __volatile__ section (and not before that point).
I can't figure out how to do so, although I could be doing something
stupid.

int main ( void )
{
  // 66 66 66 66 66 66 2e
  __asm__ __volatile__(
     ".byte 0x66\n\t"
     ".byte 0x66\n\t"
     ".byte 0x66\n\t"
     ".byte 0x66\n\t"
     ".byte 0x66\n\t"
     ".byte 0x66\n\t"
     ".byte 0x2e\n\t"
     ".byte 0x0f\n\t"
     ".byte 0x1f\n\t"
     ".byte 0x84\n\t"
     ".byte 0x00\n\t"
     ".byte 0x00\n\t"
     ".byte 0x00\n\t"
     ".byte 0x00\n\t"
     ".byte 0x00\n"
  );
  return 0;
}

surrounding code is

   14755: ff c1 inc %ecx
   14757: 48 8d 76 01 lea 0x1(%rsi),%rsi
   1475b: 48 8d 7f 01 lea 0x1(%rdi),%rdi
   1475f: 75 ef jne 14750 <calloc+0x13e0>
   14761: 66 66 66 66 66 66 2e nopw %cs:0x0(%rax,%rax,1)
   14768: 0f 1f 84 00 00 00 00
   1476f: 00

   1475f: 75 ef jne 14750 <calloc+0x13e0>
   14761: 66 66 66 66 66 66 2e nopw %cs:0x0(%rax,%rax,1)
   14768: 0f 1f 84 00 00 00 00
   1476f: 00
   14770: 48 81 fa 00 04 00 00 cmp $0x400,%rdx
   14777: 77 77 ja 147f0 <calloc+0x1480>
   14779: 89 d1 mov %edx,%ecx

better pasto. its very hard to find an application that doesn`t use calloc() ;(

Ah, my mistake. My test case did not have enough 66s. Now fixed;
vex r1776 - a one byte change :-)

Index: priv/guest-amd64/toIR.c
===================================================================
--- priv/guest-amd64/toIR.c (revision 1775)
+++ priv/guest-amd64/toIR.c (working copy)
@@ -8387,7 +8387,7 @@
       as many invalid combinations as possible. */
    n_prefixes = 0;
    while (True) {
- if (n_prefixes > 5) goto decode_failure;
+ if (n_prefixes > 7) goto decode_failure;
       pre = getUChar(delta);
       switch (pre) {
          case 0x66: pfx |= PFX_66; break;

> Ah, my mistake. My test case did not have enough 66s. Now fixed;
> vex r1776 - a one byte change :-)

And vex r1777 on the 3.2 branch.

Works great, thanks!

yes, indeed. Great work, thanks!

Fixed in both trunk and 3.2 branch.

*** Bug 150408 has been marked as a duplicate of this bug. ***

Sebastien Bacher (seb128) wrote :

Do you have a link to the upstream fix?

Christophe Fergeau (teuf-gnome) wrote :

Happens for me too with something as simple as
valgrind cat ~/Anniversaires.html

Thomas Zander (zander-kde) wrote :

Upstream bug (thats been closed for some time).

http://bugs.kde.org/show_bug.cgi?id=148447

Would seem that comment #16 there contains the commit that fixes this.

would be great to get this fix shipped in gutsy.

Sebastien Bacher (seb128) wrote :

confirming since there is an upstream bug and a patch available

Changed in valgrind:
importance: Undecided → Medium
status: New → Confirmed
Alexander Sack (asac) wrote :

will test the patch and if it helps upload the fix.

Thanks,
 - Alexander

Changed in valgrind:
assignee: nobody → asac
status: Confirmed → In Progress
Alexander Sack (asac) wrote :

the patch fixes the issue for me. uploading

Changed in valgrind:
status: In Progress → Fix Committed
Alexander Sack (asac) wrote :

valgrind (1:3.2.3-2ubuntu3) gutsy; urgency=low

  * debian/patches/21_amd64_kde148447_NEW_nop_codes.dpatch:
    - import patch from http://bugs.kde.org/show_bug.cgi?id=148447#c16 to fix
      LP: #148465.

 -- Alexander Sack <email address hidden> Sat, 06 Oct 2007 02:27:32 +0200

Changed in valgrind:
status: Fix Committed → Fix Released
Thomas Zander (zander-kde) wrote :

I can confirm this works perfectly; thanks a million guys!

Christophe Fergeau (teuf-gnome) wrote :

Works for me as well :)

Changed in valgrind:
status: Unknown → Fix Released
Changed in valgrind:
importance: Unknown → Medium

Hello, I have a similar problem when trying to run valgrind with digikam, valgrind version 3.9

http://pastebin.com/G1tEyJEe

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.