strstr() function produces wrong results under valgrind callgrind

Bug #1027977 reported by Moritz Hassert on 2012-07-23
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Valgrind
Fix Released
Medium
valgrind (Ubuntu)
Undecided
Unassigned
Precise
Low
Unassigned
Quantal
Undecided
Unassigned

Bug Description

EDIT: adapted description according to SRU Bug Template

[IMPACT]

 * impact on users:
Profiling an application with callgrind produces false results: The application silently changes behavior because of false strstr() results in certain cases.

 * justification for backporting the fix to the stable release:
- 12.04 is a LTS release. Lots of people installed it for that very reason and intend to stick with it until the next LTS release. Especially as this bug affects mainly software developers and in professional environments the upgrade cycles are longer.
Those people will not benefit from a fix in the upcoming release.
- The patch is very small and local.
- There is no danger in backporting it (see Regression Potential below).
- The fix is already in 12.10 and could be taken directly from there without any hassle.

 * The emulation of a certain SSE4-instruction in the valgrind package in 12.04 is flawed. This bug is fixed by a patch made by the upstream author.

The debdiff of 1:3.7.0-0ubuntu4 can and should be backported without change to precise. The other fixed issue is also SRU material, see bug 1036283
debdiff in comment 26

[TESTCASE]

When an application that uses the strstr() function from the C standard library is profiled with valgrind --tool=callgrind, the strstr() function produces false results (at least) under the following conditions:
* the string s1 to search in and the string s2 to search for are exact duplicates, that is strcmp(s1,s2)==0. s1 and s2 don't need to be pointing to the same memory object.
* the string length (excluding terminating zero) is a multiple of 16

Expected result: strstr(s1,s2) returns s1, indicating a match at the first charactor of s1
What happens: strstr(s1,s2) returns NULL, indicating no matching substring was found.

See attached minimal testcase for an example. Reproduce under Ubuntu 12.04 with the following steps:
$gcc strstrtest.c -o strstrtest
$./ strstrtest # <-- should report no errors
$valgrind --tool=callgrind ./ strstrtest # <-- should report errors for lengths multiple of 16

- The Problem does not show up under valgrind-3.6.0.SVN-Debian from Ubuntu 10.04 Lucid Lynx
- The Problem does not show up under tool=memcheck.

[Regression Potential]

 * I don't see any danger of regressions. There is no change in behavior other than correcting false behavior in one place. No other applications depend on valgrind/callgrind, especially nothing that a normal user or server administrator ever uses.
* I've been using the patched version for 2 months now without any problems.
* If this should introduce any sort of regression, it will only affect valgrind/callgrind itself and no other parts of the system.

---
Old description:

$valgrind --version
valgrind-3.7.0

When an application that uses the strstr() function from the C standard library is profiled with valgrind --tool=callgrind, the strstr() function produces false results (at least) under the following conditions:
* the string s1 to search in and the string s2 to search for are exact duplicates, that is strcmp(s1,s2)==0. s1 and s2 don't need to be pointing to the same memory object.
* the string length (excluding terminating zero) is a multiple of 16

Expected result: strstr(s1,s2) returns s1, indicating a match at the first charactor of s1
What happens: strstr(s1,s2) returns NULL, indicating no matching substring was found.

See attached minimal testcase for an example. Reproduce under Ubuntu 12.04 with the following steps:
$gcc strstrtest.c -o strstrtest
$./ strstrtest # <-- should report no errors
$valgrind --tool=callgrind ./ strstrtest # <-- should report errors for lengths multiple of 16

- The Problem does not show up under valgrind-3.6.0.SVN-Debian from Ubuntu 10.04 Lucid Lynx
- The Problem does not show up under tool=memcheck.

Some more info:
OS: Ubuntu 12.04 Precise Pangolin
$uname -a
Linux mhassert 3.2.0-26-generic #41-Ubuntu SMP Thu Jun 14 17:49:24 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Related branches

Moritz Hassert (mhassert) wrote :

Created attachment 72705
minimal test case

$valgrind --version
valgrind-3.7.0

When an application that uses the strstr() function from the C standard library is profiled with valgrind --tool=callgrind, the strstr() function produces false results (at least) under the following conditions:
* the string s1 to search in and the string s2 to search for are exact duplicates, that is strcmp(s1,s2)==0. s1 and s2 don't need to be pointing to the same memory object.
* the string length (excluding terminating zero) is a multiple of 16

Expected result: strstr(s1,s2) returns s1, indicating a match at the first charactor of s1
What happens: strstr(s1,s2) returns NULL, indicating no matching substring was found.

See attached minimal testcase for an example. Reproduce under Ubuntu 12.04 with the following steps:
$gcc strstrtest.c -o strstrtest
$./ strstrtest # <-- should report no errors
$valgrind --tool=callgrind ./ strstrtest # <-- should report errors for lengths multiple of 16

- The Problem does not show up under valgrind-3.6.0.SVN-Debian from Ubuntu 10.04 Lucid Lynx
- The Problem does not show up under tool=memcheck.

Some more info:
OS: Ubuntu 12.04 Precise Pangolin
$uname -a
Linux mhassert 3.2.0-26-generic #41-Ubuntu SMP Thu Jun 14 17:49:24 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Moritz Hassert (mhassert) wrote :

Also created an upstream bug report:
https://bugs.kde.org/show_bug.cgi?id=303963

I actually can reproduce this, both with VG 3.7.0 and VG 3.8.0.SVN.
This looks scary.

The strange thing is that callgrind just runs the original code, and does not
do any tricky redirections similar to memcheck. Therefore this bug also
happens with the tools none and cachegrind (just checked). It at least
looks like something bad is going on translating strstr machine code
in VEX.

/usr/bin/valgrind --tool=none ./strstrtest
==21970== Nulgrind, the minimal Valgrind tool
==21970== Copyright (C) 2002-2011, and GNU GPL'd, by Nicholas Nethercote.
==21970== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==21970== Command: ./strstrtest
==21970==
TEST1: strstr failed
TEST2: length 16: failed
...

Changing to component "general".

The bug obviously happens in __strstr_sse42 (on 64bit).

I just compared the control flow inside __strstr_sse42 on a real machine
(using gdb) and within valgrind (just running callgrind with "--collect-jumps=yes").

It looks like the VEX emulation of "pcmpistri" (used with mode "equal ordered")
is buggy.

(In reply to comment #3)
 > It looks like the VEX emulation of "pcmpistri" (used with mode "equal
> ordered") is buggy.

Josef, I am not surprised to hear that (+ thanks for chasing it). Which pcmpistri case
is it (iow, which immediate byte) ?

(In reply to comment #4)
> Josef, I am not surprised to hear that (+ thanks for chasing it). Which
> pcmpistri case
> is it (iow, which immediate byte) ?

It's imm8 = 0xc. I think I found the problem:

There is a discrepancy between real hardware and emulation if the needle is an empty string, ie. starts with "\0". This happens on the last call to pcmpistri in __strstr_sse42 when the needle length is a multiple of 16.

For all positions in the haystack, VEX stops search whenever the end of the needle is found. As it starts with assuming "no hit found", all search results will be false.
In contrast to that, according to table 4-3 in the SDM, the real pcmpistri
starts with the assumption "hit found".

Patch, which makes the test case of this bug report work
(needs also be changed for the _wide variant):

--- a/VEX/priv/guest_generic_x87.c
+++ b/VEX/priv/guest_generic_x87.c
@@ -891,7 +891,7 @@ Bool compute_PCMPxSTRx ( /*OUT*/V128* resV,
       UInt ni, hi;
       UChar* argL = (UChar*)argLV;
       UChar* argR = (UChar*)argRV;
- UInt boolRes = 0;
+ UInt boolRes = 0xFFFF;
       UInt validL = ~(zmaskL | -zmaskL); // not(left(zmaskL))
       UInt validR = ~(zmaskR | -zmaskR); // not(left(zmaskR))
       for (hi = 0; hi < 16; hi++) {
@@ -905,7 +905,7 @@ Bool compute_PCMPxSTRx ( /*OUT*/V128* resV,
             if (i >= 16) break;
             if (argL[i] != argR[ni]) { m = 0; break; }
          }
- boolRes |= (m << hi);
+ if (m==0) boolRes &= ~(1 << hi);
       }

Of course it would be nice to add some test cases.

The diff below adds test cases.

Index: none/tests/amd64/pcmpstr64.c
===================================================================
--- none/tests/amd64/pcmpstr64.c (revision 12776)
+++ none/tests/amd64/pcmpstr64.c (working copy)
@@ -639,6 +639,11 @@
    try_istri(wot,h,s, "1111111111111234", "1111111111111234");
    try_istri(wot,h,s, "a111111111111111", "000000000000000a");
    try_istri(wot,h,s, "b111111111111111", "000000000000000a");
+
+ try_istri(wot,h,s, "b111111111111111", "0000000000000000");
+ try_istri(wot,h,s, "0000000000000000", "0000000000000000");
+ try_istri(wot,h,s, "123456789abcdef1", "0000000000000000");
+ try_istri(wot,h,s, "0000000000000000", "123456789abcdef1");
 }

Index: none/tests/amd64/pcmpstr64w.c
===================================================================
--- none/tests/amd64/pcmpstr64w.c (revision 12776)
+++ none/tests/amd64/pcmpstr64w.c (working copy)
@@ -638,6 +638,11 @@
    try_istri(wot,h,s, "1111111111111234", "1111111111111234");
    try_istri(wot,h,s, "0a11111111111111", "000000000000000a");
    try_istri(wot,h,s, "0b11111111111111", "000000000000000a");
+
+ try_istri(wot,h,s, "b111111111111111", "0000000000000000");
+ try_istri(wot,h,s, "0000000000000000", "0000000000000000");
+ try_istri(wot,h,s, "123456789abcdef1", "0000000000000000");
+ try_istri(wot,h,s, "0000000000000000", "123456789abcdef1");
 }

(In reply to comment #5)

Josef, thanks for chasing this:

> Patch, which makes the test case of this bug report work
> (needs also be changed for the _wide variant):

That unfortunately causes other test cases to fail.

The test programs (none/tests/amd64/pcmpstr64{,w}.c) can be used to
test the C logic without using Valgrind, because they contain a copy
of same code that is in guest_generic_x87.c and they compare the
output against the real instruction. Hence you can do this:

gcc -Wall -g -O -o foo none/tests/amd64/pcmpstr64.c && ./foo | grep "\!\!\!"

and it will show any lines where the C simulation is wrong. With
the test case additions in the previous comment, I get:

sewardj@phoenix:~/VgTRUNK/trunk$ gcc -Wall -g -O -o foo none/tests/amd64/pcmpstr64.c && ./foo | grep "\!\!\!"
istri 0C 00abcde100bcde11 00000000000abcde -> 00c00010 00c10006 !!!!
istri 0C 0000000000000000 123456789abcdef1 -> 00400010 08410000 !!!!

Forget the previous patch.
Actually, it is enough to move the break condition checking the end of
the haystack to the end of the loop. This allows an empty needle to match
an empty haystack. Passes all tests, including the new ones from
comment 6.

--- a/priv/guest_generic_x87.c
+++ b/priv/guest_generic_x87.c
@@ -895,9 +895,6 @@ Bool compute_PCMPxSTRx ( /*OUT*/V128* resV,
       UInt validL = ~(zmaskL | -zmaskL); // not(left(zmaskL))
       UInt validR = ~(zmaskR | -zmaskR); // not(left(zmaskR))
       for (hi = 0; hi < 16; hi++) {
- if ((validL & (1 << hi)) == 0)
- // run off the end of the haystack
- break;
          UInt m = 1;
          for (ni = 0; ni < 16; ni++) {
             if ((validR & (1 << ni)) == 0) break;
@@ -906,6 +903,9 @@ Bool compute_PCMPxSTRx ( /*OUT*/V128* resV,
             if (argL[i] != argR[ni]) { m = 0; break; }
          }
          boolRes |= (m << hi);
+ if ((validL & (1 << hi)) == 0)
+ // run off the end of the haystack
+ break;
       }

Created attachment 72727
Patch for VEX

Created attachment 72728
Patch for VG (test case)

> Created attachment 72727 [details]
> Patch for VEX
> Created attachment 72728 [details]
> Patch for VG (test case)

Looks fine, please commit!

Changed in valgrind:
importance: Unknown → Medium
status: Unknown → Fix Released
Moritz Hassert (mhassert) wrote :

Setting to confirmed because it is confirmed and fixed upstream

Changed in valgrind (Ubuntu):
status: New → Confirmed

Hi Josef and Julian. Thanks for fixing the bug so fast!
As far as I understand the patches are for now only in cvs head? I'd really love to get this fixed in the debian/ubuntu package as fast as possible. Do you plan on releasing a patched version of valgrind-3.7.0 in the near future? Or should I contact the debian/ubuntu package maintainers to include your attached patches? In the latter case, do you see any problems applying them to the 3.7.0 release?

I do not know about plans for 3.7.1.
I would simply take the sources, patch them with the fix attached to this bug report, and compile it yourself. Or is there a reason to care about other Ubuntu users?

Moritz Hassert (mhassert) wrote :

The attached patch is taken from the fix of the upstream bug report. I modified it slightly to fit version 3.7.0 in the debian/ubuntu package:
- removed the second part as that function does not yet exist in 3.7.0
- Corrected line numbers of chunks to avoid warning when patch is applied.

The attachment "adapted version of upstream patch" of this bug report has been identified as being a patch. The ubuntu-reviewers team has been subscribed to the bug report so that they can review the patch. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-reviewers team please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags: added: patch
Moritz Hassert (mhassert) wrote :

I built a patched version of the current valgrind package using fix-VEX-PCMPxSTRx.patch
Attached is the output of debdiff.
I'v been using this patched version for a day now and it works fine for me.

Yes, compiling myself was of course the quick solution. I patched and built the valgrind package from Ubuntu locally and it works like a charm. Thanks again! Saves me lots of trouble.

There's no particular reason to care about other Ubuntu users. It's not really an security issue or anything like that. But I do care about myself and all the machines I would have to install my custom deb package.

I adapted your "Patch for VEX" patch for version 3.7.0. It's attached to my corresponding Ubuntu bug report:
https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1027977

Julian Taylor (jtaylor) on 2012-10-05
Changed in valgrind (Ubuntu):
assignee: nobody → Julian Taylor (jtaylor)
Julian Taylor (jtaylor) on 2012-10-05
Changed in valgrind (Ubuntu):
assignee: Julian Taylor (jtaylor) → nobody
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package valgrind - 1:3.7.0-0ubuntu4

---------------
valgrind (1:3.7.0-0ubuntu4) quantal; urgency=low

  * fix-buffer-overflows.patch: fix overflows in vgdb
  * 05_fix-callgrind_control.patch: fix valgrind process name (LP: #1036283)
  * fix-VEX-PCMPxSTRx.patch: fix strstr handling (LP: #1027977)
 -- Julian Taylor <email address hidden> Fri, 05 Oct 2012 20:16:28 +0200

Changed in valgrind (Ubuntu):
status: Confirmed → Fix Released
Moritz Hassert (mhassert) wrote :

Hi Julian,

nice to see the patch made it into upcoming quatal/12.10.

Any chance to get it also into precise/12.04? After all it's a LTS release and lot's of machines (especially development machines) won't make the switch to 12.10 and so would be stuck with the bug for years.

Julian Taylor (jtaylor) wrote :

can you please edit the bug description according to this template:
https://wiki.ubuntu.com/StableReleaseUpdates#SRU_Bug_Template

I unfortunately do not have the hardware to actually reproduce the issue (no sse4).

description: updated
Julian Taylor (jtaylor) on 2012-10-10
description: updated
Julian Taylor (jtaylor) wrote :
description: updated
Sebastien Bacher (seb128) wrote :

Thanks Julian for the SRU work, the SRU team tries to ensure that every changeset is documented with a bug/testcase, that doesn't seem to be the case for "* fix-buffer-overflows.patch: fix overflows in vgdb" ... is there any way you could link that to a launchpad bug or at least document the change on this launchpad ticket?

Changed in valgrind (Ubuntu Precise):
importance: Undecided → Low
status: New → Triaged
Julian Taylor (jtaylor) wrote :

without the patch vgdb crashes in quantal with gcc-4.7 in the testcase mentioned in the linked bug
I could not reproduce it on precise at the time, but given the simplicity I think its a good idea to fix the issue there too.

attached a debdiff with better description and bug links

Marc Deslauriers (mdeslaur) wrote :

precise debdiff looks good, thanks!

Uploading to precise-proposed, and subscribing the ubuntu-sru team for processing.

Changed in valgrind (Ubuntu Quantal):
status: New → Fix Released

Hello Moritz, or anyone else affected,

Accepted valgrind into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/valgrind/1:3.7.0-0ubuntu3.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in valgrind (Ubuntu Precise):
status: Triaged → Fix Committed
tags: added: verification-needed

Checked the test case pre installing the proposed version to see the issue (which I did) and tested post installing the proposed version for precise and all is working as expected here.

tags: added: verification-done-precise
tags: removed: verification-needed

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package valgrind - 1:3.7.0-0ubuntu3.1

---------------
valgrind (1:3.7.0-0ubuntu3.1) precise-proposed; urgency=low

  * fix-buffer-overflows.patch: fix overflows in vgdb
  * 05_fix-callgrind_control.patch: fix valgrind process name (LP: #1036283)
  * fix-VEX-PCMPxSTRx.patch: fix strstr handling (LP: #1027977)
 -- Julian Taylor <email address hidden> Wed, 10 Oct 2012 20:16:28 +0200

Changed in valgrind (Ubuntu Precise):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.