evergreen.lpad_number_substrings doesn't handle internal substring matches properly

Bug #1464765 reported by Jason Boyer on 2015-06-12
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Low
Unassigned

Bug Description

All versions with evergreen.lpad_number_substrings, other versions N/A (Note: The version of this function in Eg 2.7 is different than the version in master, but I put the current version into my test db and it doesn't fix this problem.)

The evergreen.lpad_number_substrings function is used in multiple places to create sortkeys for various labels, but if the label contains smaller substrings of numbers that are matched within larger strings of numbers, the padding will be off. (This is harder to explain in English than it is to show...)

As an example, this monograph part label '2015 01 (Jan)' will call this function with a pad of '0' and a length of 10, which should result in '0000002015 0000000001 (Jan)' (I'm ignoring the naco_normalize step here, it's not relevant to the issue) but because the 01 is also present in 2015, the result is '000000200000000015 0000000001 (Jan)' The fix is to make the s/// at the bottom of the function (circa 2.7) more picky about its matches. I've found out what it needs to be, now I need to learn about these tests I keep hearing so much about. :)

If you are impatient and want to help test before I have time to figure the pgtap testing part out, this is the function that works for me, note the s/(^|\D)$1($|\D)/$1$padded$2/sg near the bottom:

CREATE OR REPLACE FUNCTION evergreen.lpad_number_substrings(text, text, integer)
 RETURNS text
 LANGUAGE plperlu
AS $function$
    my $string = shift;
    my $pad = shift;
    my $len = shift;
    my $find = $len - 1;

    while ($string =~ /(?:^|\D)(\d{1,$find})(?:$|\D)/) {
        my $padded = $1;
        $padded = $pad x ($len - length($padded)) . $padded;
        $string =~ s/(^|\D)$1($|\D)/$1$padded$2/sg;
    }

    return $string;
$function$

This should be an easy test to write, I'm hoping I can get to it soon.

Jason Boyer (jboyer) on 2015-06-12
description: updated
Thomas Berezansky (tsbere) wrote :

How about something like this?

CREATE OR REPLACE FUNCTION evergreen.lpad_number_substrings(text, text, integer)
 RETURNS text
 LANGUAGE plperlu
AS $function$
    my $string = shift;
    my $pad = shift;
    my $len = shift;

    $string =~ s/([0-9]+)/$pad x ($len - length($1)) . $1/eg;

    return $string;
$function$

Jason Boyer (jboyer) on 2015-06-22
summary: - evergreen.lpad_number_substrings doesn't handle "internal" substrings
- properly
+ evergreen.lpad_number_substrings doesn't handle internal substring
+ matches properly
Jason Boyer (jboyer) wrote :

Thomas, that also works and I prefer it since doesn't need a loop. Thanks!

Jason Boyer (jboyer) wrote :

Finally put this together, a couple busy, busy weeks were delaying me. :-/ Here's the branch, including test. I updated an existing test for this function rather than add a new file to test the same function, but can change that if desired.

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/jboyer/lp1464765_lpad_number

tags: added: pullrequest
Jason Boyer (jboyer) on 2015-08-19
Changed in evergreen:
milestone: none → 2.next
Galen Charlton (gmc) on 2015-11-05
Changed in evergreen:
assignee: nobody → Galen Charlton (gmc)
Galen Charlton (gmc) wrote :

As discussed with Jason during the Hack-a-Way, this functionally duplicates bug 1155313. However, since this version of the stored procedure is about a third faster, I've pushed a modified version of this to master. Thanks, Jason!

Changed in evergreen:
assignee: Galen Charlton (gmc) → nobody
status: New → Fix Committed
importance: Undecided → Low
Changed in evergreen:
milestone: 2.next → 2.10-beta
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers