search does not work for books whose name contains an underscore followed by 4 digits

Bug #530523 reported by mangtronix
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Internet Archive BookReader
Fix Released
Medium
mangtronix

Bug Description

Search does not return any results, or even "no results" for this book.

http://www.archive.org/stream/TheZenithYearbook1993HighPointUniversity/THE_ZENITH_1993

Tags: qa-verified
mangtronix (mang)
Changed in bookreader:
assignee: nobody → mangtronix (mang)
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
mangtronix (mang) wrote :

Search also does not work for this book: http://www.archive.org/stream/ANestableTaperedColumnConceptForLargeSpaceStructures/19760022270_1976022270#search/shuttle

The OCR txt and xml files both have "Shuttle" in them.

Revision history for this message
mangtronix (mang) wrote :

For the NASA book it looks like the problem is the regex for matching thing page value in the search results.

NASA book:
<PARAM name="PAGE" value="19760022270_1976022270_0000.djvu"/>

Other book:
<PARAM name="PAGE" value="armageddonafter00couruoft_0001.djvu"/>

Regex code:
            var re = new RegExp (/_(\d{4})/);
            var reMatch = re.exec(pages[i].getAttribute('file'));

Revision history for this message
mangtronix (mang) wrote :

Fixed for both books by tightening up the regex.

Changed in bookreader:
milestone: none → r22
mangtronix (mang)
summary: - search does not work for this book
+ search does not work for books with name containing _NNNN
Revision history for this message
mangtronix (mang) wrote : Re: search does not work for books with name containing _NNNN

Please check that search works as expected for the following books:

1. http://www-testflip.archive.org/stream/ANestableTaperedColumnConceptForLargeSpaceStructures/19760022270_1976022270

2. http://www-testflip.archive.org/stream/TheZenithYearbook1993HighPointUniversity/THE_ZENITH_1993

3. A random selection of books from www-testflip.archive.org

There shouldn't be any cross-browser differences possible from this fix, so testing two browser/OS combinations should be sufficient.

Just so you know, the OCR text is available from the "All files: HTTP" link and the file names are identifier/bookname_djvu.xml and identifier/bookname_djvu.txt. These are also available via our /download/ links, e.g.:
http://www.archive.org/download/ANestableTaperedColumnConceptForLargeSpaceStructures/19760022270_1976022270_djvu.txt
http://www.archive.org/download/ANestableTaperedColumnConceptForLargeSpaceStructures/19760022270_1976022270_djvu.xml

tags: added: needs-qa
Changed in bookreader:
status: Confirmed → Triaged
Winnie (winnie-archive)
Changed in bookreader:
assignee: mangtronix (mang) → Winnie (winnie-archive)
Revision history for this message
mangtronix (mang) wrote :

Added bug report on searching for numbers not working: https://bugs.edge.launchpad.net/bookreader/+bug/540433

summary: - search does not work for books with name containing _NNNN
+ search does not work for books whose name contains an underscore
+ followed by 4 digits
Revision history for this message
Winnie (winnie-archive) wrote :

regressed search for alpha characters (words). fixed.

Changed in bookreader:
assignee: Winnie (winnie-archive) → mangtronix (mang)
Revision history for this message
mangtronix (mang) wrote :

Awesome you rock! :)

mangtronix (mang)
Changed in bookreader:
status: Triaged → Fix Released
tags: added: qa-verified
removed: needs-qa
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.