fn:unparsed-text-lines test hangs

Bug #1073175 reported by Sorin Marian Nasoi
This bug report is a duplicate of:  Bug #1169908: Zorba hangs with invalid utf-8 input. Edit Remove
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Zorba
New
Critical
Paul J. Lucas

Bug Description

import module namespace xqxq = 'http://www.zorba-xquery.com/modules/xqxq';

declare namespace resolver = 'http://www.zorba-xquery.com/modules/xqxq/url-resolver';
declare namespace op = 'http://www.zorba-xquery.com/options/features';
declare namespace f = 'http://www.zorba-xquery.com/features';
declare option op:enable 'f:hof';

declare function resolver:url-resolver($namespace as xs:string, $entity as xs:string) {
switch($entity)
case ''
  return switch($namespace)
         case 'http://www.w3.org/fots/unparsed-text/text-plain-utf-16be-bom-lines.txt'
 return fn:unparsed-text('/home/spungi/work/zorba/w3c_repo/2011/QT3-test-suite/fn/unparsed-text/text-plain-utf-16be-bom-lines.txt','utf-8')
         default return ()
default return ()
};
variable $queryID := xqxq:prepare-main-module('
xquery version ''3.0'';

declare namespace op = ''http://www.zorba-xquery.com/options/features'';
declare namespace f = ''http://www.zorba-xquery.com/features'';
declare option op:enable ''f:hof'';
fn:unparsed-text-lines("http://www.w3.org/fots/unparsed-text/text-plain-utf-16be-bom-lines.txt") ! string-length(.)'
, resolver:url-resolver#2, ());

xqxq:evaluate($queryID)

Related branches

Revision history for this message
Sorin Marian Nasoi (sorin.marian.nasoi) wrote :

Additional info about the problem: as you can see from the name of the file it contains UTF-16 characters,
yet the encoding is said to be utf-8.

According to the spec, in this case err:FOUT1190 should be raised.

Changed in zorba:
importance: Undecided → Critical
milestone: none → 2.8
Revision history for this message
Juan Zacarias (juan457) wrote :

Hi,

The bug is fixed, I also fixed this case for unparsed-text and unparsed-text-available since unparsed-text wasn't throwing the error to and unparsed-text-available was returning true for this input.

Chris Hillery (ceejatec)
Changed in zorba:
milestone: 2.8 → 2.9
Chris Hillery (ceejatec)
tags: added: hotlist
Revision history for this message
Chris Hillery (ceejatec) wrote :

Paul, I need you to propose an API that meets the following requirements:

1. It should accept a std::istream, and either modify it in-place or return a new istream. (Alternately it could work on std::streambuf objects, if and only if all the memory-management requirements can still be met, which I don't think they can.)

2. It should accept a Zorba error code or exception object, a QueryLoc, and any additional error-message parameters as necessary.

3. It should arrange so that any further reading from that istream will throw an instance of that Zorba exception, with all appropriate parameters, if the istream produces invalid UTF8.

4. It should correctly handle memory manage in all cases, de-allocating any created objects either when the istream is itself deleted or when an exception is thrown.

Please let me know how you would like to proceed, and any consequences of the above API that may not be obvious.

Once this is done, we may need/want to revisit transcode::streambuf in a similar fashion. If there is a clever template-y way to layer this stuff around any existing streambuf implementations, so much the better.

Changed in zorba:
assignee: Juan Zacarias (juan457) → Paul J. Lucas (paul-lucas)
summary: - FOTS: fn:unparsed-text-lines test hangs
+ fn:unparsed-text-lines test hangs
Chris Hillery (ceejatec)
tags: removed: core-runtime
tags: added: segfault
Revision history for this message
Paul J. Lucas (paul-lucas) wrote :

> 2. It should accept a Zorba error code or exception object, a QueryLoc, and any additional error-message parameters as necessary.

Can you explain why it needs an error code that isn't ZXQD0006_INVALID_UTF8_BYTE_SEQUENCE? AFAIK, there isn't an error code defined in the spec for this (which is why I made a Zorba-specific one up).

What QueryLoc should it use? The same stream might be read from several places throughout the code?

Revision history for this message
Sorin Marian Nasoi (sorin.marian.nasoi) wrote :

The test case that used to hang was "fn-unparsed-text-lines-052" from "fn-unparsed-text-lines" test set.
In the current trunk, the test case fails, does not hang/crash any more.

Chris Hillery (ceejatec)
tags: removed: segfault
Chris Hillery (ceejatec)
tags: removed: fots
tags: removed: hotlist
Revision history for this message
Chris Hillery (ceejatec) wrote :

I have opened a new bug which isolates this issue more thoroughly and discusses the possible solutions. I have marked this bug as a duplicate, as the discussion of the particular FOTS test cases and problems is no longer relevant.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.