strstr broken for some inputs on pre-SSE4 machines

Bug #655463 reported by Steve Atwell on 2010-10-06
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
Undecided
Unassigned
eglibc
Fix Released
Medium
eglibc (Ubuntu)
High
Matthias Klose
Nominated for Hardy by vk3127202
Nominated for Karmic by vk3127202
Lucid
High
Matthias Klose
Maverick
High
Matthias Klose

Bug Description

# lsb_release -rd
Description: Ubuntu 10.04.1 LTS
Release: 10.04

# apt-cache policy libc6
libc6:
  Installed: 2.11.1-0ubuntu7.2
  Candidate: 2.11.1-0ubuntu7.2

The strstr function is broken for certain classes of inputs. This has already been reported in upstream glibc at http://sourceware.org/bugzilla/show_bug.cgi?id=12092. The bug includes a proposed fix.

I've verified that libc6 2.11.1-0ubuntu7.2 on Lucid exhibits this broken behavior on a pre-SSE4 machine (a Xeon L5335).

I'm attaching Paul Pluzhnikov's code snippet from the upstream bug that demonstrates the broken behavior.

Created attachment 5035
what appears to be minimal test case

Attached test case fails with glibc-2.11.1 and with current git trunk; passes with glibc-2.7.

The failure I see on SSE2 and SSE3 machines is:
BUG: 55 vs. 115

The bug does *not* show up on SSE4_2 machines (either 32 or 64-bit mode).

Additional analysis from <email address hidden>:

I'm not completely sure, but this is what I see so far. The bug can only occur when the second argument to strstr (the needle) is periodic, which is to say that it consists entirely of some repeated string. When that happens, the code can fail to match if the first argument to strstr (the haystack) contains two or more repetitions of the needle's periodic string, but not as many as the number of occurrences as are in the needle. In that case strstr can sometimes return a pointer to the smaller number of repetitions, when it should properly return NULL or a later pointer. Also, the needle has to be 32 bytes or more.

Created attachment 5037
slightly simplified test case

I think the problem is the Boyer-Moore shift in two_way_long_needle in
str-two-way.h. It does not correctly update MEMORY. I think we need something
like

          if (memory && shift < period)
        {
          /* Since needle is periodic, but the last period has
             a byte out of place, there can be no match until
             after the mismatch. */
          shift = needle_len - period;
          memory = 0;
        }
          else if (memory > shift)
        memory = memory - shift;
          else
        memory = 0;

Yep, resetting 'memory' after a large shift is required; I'm testing your idea now, but think you have the right patch in mind.

Your test for (memory > shift) will never be reached. Other than the assignment added by your proposed patch, memory is only ever assigned to be 0 or needle_len - period. And for a periodic needle, shift is either needle_len or < period, by virtue of how the shift table is constructed. Therefore, if memory is non-zero but shift >= period, then shift is necessarily > memory at that point.

Which means your code can be reduced to this simpler patch:

diff --git i/string/str-two-way.h w/string/str-two-way.h
index 502af47..76044b3 100644
--- i/string/str-two-way.h
+++ w/string/str-two-way.h
@@ -350,8 +350,8 @@ two_way_long_needle (const unsigned char *haystack,
        a byte out of place, there can be no match until
        after the mismatch. */
     shift = needle_len - period;
- memory = 0;
   }
+ memory = 0;
        j += shift;
        continue;
      }

Created attachment 5039
fix strstr and memmem

Please add a testcase and post to <email address hidden>.

Steve Atwell (satwell) wrote :

Interestingly enough:

strstr() and strcasestr() are only broken pre-SSE4, but memmem() is broken even on SSE4 machines.

On the other hand, on SSE4 machines, strstr() and strcasestr() are quadratic in behavior; in other words, the use of an assembly implementation has actually caused a performance regression over the fix for
http://sourceware.org/bugzilla/show_bug.cgi?id=5514

$ cat foo.c
#define _GNU_SOURCE
#include <string.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>

#define P ":012345678-"

static void quit (int sig) { exit (sig + 128); }
int main(int argc, char **argv)
{
  const char *hay = ";" ":013245678-" P P ":012345678." P ":012345678." P;
  const char *needle = P P P;

  size_t m = 1000000;
  char *largehay = malloc (2 * m + 2);
  char *largeneedle = malloc (m + 2);

  signal (SIGALRM, quit);
  alarm (5);
  if (!largehay || !largeneedle)
    return 2;
  memset (largehay, 'A', 2 * m);
  largehay[2 * m] = 'B';
  largehay[2 * m + 1] = 0;
  memset (largeneedle, 'A', m);
  largeneedle[m] = 'B';
  largeneedle[m + 1] = 0;

  switch (argc > 1 ? atoi (argv[1]) : 0)
    {
      /* Demonstrate str-two-way.h bug. */
    case 1:
      return !!memmem (hay, strlen (hay), needle, strlen (needle));
    case 2:
      return !!strstr (hay, needle);
    case 3:
      return !!strcasestr (hay, needle);

      /* Demonstrate quadratic behavior. */
    case 4:
      return !memmem (largehay, strlen (largehay),
        largeneedle, strlen (largeneedle));
    case 5:
      return !strstr (largehay, largeneedle);
    case 6:
      return !strcasestr (largehay, largeneedle);

      /* Usage error. */
    default:
      return 2;
    }
}
$ for i in $(seq 6); do ./foo $i; echo $?; done
1
0
0
0
142
142

Fixed in git.

(In reply to comment #9)
> Fixed in git.

The incorrect results of memmem() and of non-SSE4 strstr() are fixed. However, the glibc 2.11 regression of reintroducing quadratic behavior for SSE4 strstr is not yet fixed.

Changed in eglibc (Ubuntu Maverick):
status: New → Confirmed
Changed in eglibc (Ubuntu Lucid):
status: New → Confirmed
importance: Undecided → High
milestone: none → lucid-updates
Changed in eglibc (Ubuntu Maverick):
importance: Undecided → High
milestone: none → maverick-updates
Matthias Klose (doko) on 2010-10-07
Changed in eglibc (Ubuntu Lucid):
status: Confirmed → In Progress
Changed in eglibc (Ubuntu Maverick):
status: Confirmed → In Progress
Changed in eglibc (Ubuntu Lucid):
assignee: nobody → Matthias Klose (doko)
Changed in eglibc (Ubuntu Maverick):
assignee: nobody → Matthias Klose (doko)
Changed in ubuntu-release-notes:
status: New → Won't Fix

Accepted eglibc into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in eglibc (Ubuntu Maverick):
status: In Progress → Fix Committed
tags: added: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eglibc - 2.12.1-0ubuntu7

---------------
eglibc (2.12.1-0ubuntu7) maverick-proposed; urgency=low

  * Fix issue #12092, strstr broken for some inputs on pre-SSE4 machines.
    LP: #655463.
 -- Matthias Klose <email address hidden> Thu, 07 Oct 2010 09:01:06 +0200

Changed in eglibc (Ubuntu Maverick):
status: Fix Committed → Fix Released
Martin Pitt (pitti) wrote :

This was copied to Natty (not to maverick, LP getting confused)

Changed in eglibc (Ubuntu):
milestone: maverick-updates → none
status: Fix Committed → Fix Released
Changed in eglibc (Ubuntu Maverick):
status: Fix Released → Fix Committed
Jean-Baptiste Lallement (jibel) wrote :

SRU verification for Maverick:
I have reproduced the problem with eglibc 2.12.1-0ubuntu6 in maverick and have verified that the version of eglibc 2.12.1-0ubuntu7 in -proposed fixes the issue. Test done on an Intel P4.

Marking as verification-done

tags: added: verification-done
removed: verification-needed
Martin Pitt (pitti) wrote :

Accepted eglibc into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in eglibc (Ubuntu Lucid):
status: In Progress → Fix Committed
tags: removed: verification-done
tags: added: verification-needed
Jean-Baptiste Lallement (jibel) wrote :

SRU verification for Lucid:
I have reproduced the problem with eglibc 2.11.1-0ubuntu7.2 in lucid-updates and have verified that the version of eglibc 2.11.1-0ubuntu7.4 in -proposed fixes the issue. It passes the QRT:scripts/{test-glibc-security.py,test-glibc.py} Tests done on an Intel P4.

Marking as verification-done

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eglibc - 2.12.1-0ubuntu7

---------------
eglibc (2.12.1-0ubuntu7) maverick-proposed; urgency=low

  * Fix issue #12092, strstr broken for some inputs on pre-SSE4 machines.
    LP: #655463.
 -- Matthias Klose <email address hidden> Thu, 07 Oct 2010 09:01:06 +0200

Changed in eglibc (Ubuntu Maverick):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eglibc - 2.11.1-0ubuntu7.4

---------------
eglibc (2.11.1-0ubuntu7.4) lucid-proposed; urgency=low

  * Fix issue #12092, strstr broken for some inputs on pre-SSE4 machines.
    LP: #655463.
 -- Matthias Klose <email address hidden> Thu, 07 Oct 2010 09:09:06 +0200

Changed in eglibc (Ubuntu Lucid):
status: Fix Committed → Fix Released
Changed in eglibc:
importance: Unknown → Medium
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.