Bug #1153919 “JSON parser doesn't report end line and col” : Bugs : Zorba

William Candillon (wcandillon) on 2013-03-12

Changed in zorba:
assignee:	nobody → Paul J. Lucas (paul-lucas)

Paul J. Lucas (paul-lucas) on 2013-03-15

description:

updated

Revision history for this message

Paul J. Lucas (paul-lucas) wrote on 2013-03-16:

#1

Who said $zerr:line-number-end and $zerr:column-number-end are supposed to have any value set at all?

Revision history for this message

William Candillon (wcandillon) wrote on 2013-03-16: Re: [Bug 1153919] Re: JSON parser doesn't report end line and col

#3

Nobody.
I'm just looking at source_line_end() and source_column_end(), it
works in the case of XML parsing.
http://www.zorba-xquery.com/html/documentation/2.8.0/cxx/classzorba_1_1XQueryException#acfc67763485cee7f4dffd51e7b2abf90

On Fri, Mar 15, 2013 at 5:26 PM, Paul J. Lucas
<email address hidden> wrote:
> Who said $zerr:line-number-end and $zerr:column-number-end are supposed
> to have any value set at all?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1153919
>
> Title:
> JSON parser doesn't report end line and col
>
> Status in Zorba - The XQuery Processor:
> New
>
> Bug description:
> The following query:
>
> declare namespace err = "http://www.w3.org/2005/xqt-errors";
> declare namespace zerr = "http://www.zorba-xquery.com/errors";
>
> let $src := '{
> "_id" : "511C7C5C9A277C22D138802E",
> "question_id" : 1777103,
> "last_edit_date" : "2010-10-16T13:27:10",
> "creation_date" : "2009-11-21T23:09:55",
> "last_activity_date" : "2012-11-12T06:29:34",
> "score" : 248,'
> return
> try {
> jn:parse-json($src)
> }
> catch * {
> {
> "error": true,
> "code": $err:code,
> "description": $err:description,
> "module": $err:module,
> "line-number": $err:line-number,
> "column-number": $err:column-number,
> "line-number-end": $zerr:line-number-end,
> "column-number-end": $zerr:column-number-end
> }
> }
>
> Returns:
>
> {
> "error" : true,
> "code" : "jerr:JNDY0021",
> "description" : "\"<none>\": unexpected JSON token",
> "module" : "/Users/wcandillon/28msec/zorba/build/test.xq",
> "line-number" : 19,
> "column-number" : 17,
> "line-number-end" : 0,
> "column-number-end" : 0
> }
>
> In this scenario, "line-number-end" : 0, "column-number-end" : 0
> doesn't seem to be correct.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/zorba/+bug/1153919/+subscriptions

Nobody.
I'm just looking at source_line_end() and source_column_end(), it
works in the case of XML parsing.
http://www.zorba-xquery.com/html/documentation/2.8.0/cxx/classzorba_1_1XQueryException#acfc67763485cee7f4dffd51e7b2abf90

On Fri, Mar 15, 2013 at 5:26 PM, Paul J. Lucas
<1153919@bugs.launchpad.net> wrote:
> Who said $zerr:line-number-end and $zerr:column-number-end are supposed
> to have any value set at all?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1153919
>
> Title:
>   JSON parser doesn't report end line and col
>
> Status in Zorba - The XQuery Processor:
>   New
>
> Bug description:
>   The following query:
>
>       declare namespace err = "http://www.w3.org/2005/xqt-errors";
>       declare namespace zerr = "http://www.zorba-xquery.com/errors";
>
>       let $src := '{
>         "_id" : "511C7C5C9A277C22D138802E",
>         "question_id" : 1777103,
>         "last_edit_date" : "2010-10-16T13:27:10",
>         "creation_date" : "2009-11-21T23:09:55",
>         "last_activity_date" : "2012-11-12T06:29:34",
>         "score" : 248,'
>       return
>         try {
>           jn:parse-json($src)
>         }
>         catch * {
>           {
>             "error": true,
>             "code": $err:code,
>             "description": $err:description,
>             "module": $err:module,
>             "line-number": $err:line-number,
>             "column-number": $err:column-number,
>             "line-number-end": $zerr:line-number-end,
>             "column-number-end": $zerr:column-number-end
>           }
>         }
>
>   Returns:
>
>   {
>     "error" : true,
>     "code" : "jerr:JNDY0021",
>     "description" : "\"<none>\": unexpected JSON token",
>     "module" : "/Users/wcandillon/28msec/zorba/build/test.xq",
>     "line-number" : 19,
>     "column-number" : 17,
>     "line-number-end" : 0,
>     "column-number-end" : 0
>   }
>
>   In this scenario, "line-number-end" : 0, "column-number-end" : 0
>   doesn't seem to be correct.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/zorba/+bug/1153919/+subscriptions

Revision history for this message

William Candillon (wcandillon) wrote on 2013-03-16:

#2

Nobody.
I'm just looking at source_line_end() and source_column_end(), it
works in the case of XML parsing.
http://www.zorba-xquery.com/html/documentation/2.8.0/cxx/classzorba_1_1XQueryException#acfc67763485cee7f4dffd51e7b2abf90

Revision history for this message

Paul J. Lucas (paul-lucas) wrote on 2013-03-19:

#4

Well, all the documentation says is that it returns the end line number or 0 if unset. Can you construct an example where the end line number is actually not equal to zero?

Revision history for this message

William Candillon (wcandillon) wrote on 2013-03-19:

#5

Ok if this is the expected behavior, perfect then.
Why not having the same end line and col that the start col and line?
That would help consumer of this API such as editors to not that to
deal with this special case, does it make sense?

On Tue, Mar 19, 2013 at 10:00 AM, Paul J. Lucas
<email address hidden> wrote:
> Well, all the documentation says is that it returns the end line number
> or 0 if unset. Can you construct an example where the end line number is
> actually not equal to zero?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1153919
>
> Title:
> JSON parser doesn't report end line and col
>
> Status in Zorba - The XQuery Processor:
> New
>
> Bug description:
> The following query:
>
> declare namespace err = "http://www.w3.org/2005/xqt-errors";
> declare namespace zerr = "http://www.zorba-xquery.com/errors";
>
> let $src := '{
> "_id" : "511C7C5C9A277C22D138802E",
> "question_id" : 1777103,
> "last_edit_date" : "2010-10-16T13:27:10",
> "creation_date" : "2009-11-21T23:09:55",
> "last_activity_date" : "2012-11-12T06:29:34",
> "score" : 248,'
> return
> try {
> jn:parse-json($src)
> }
> catch * {
> {
> "error": true,
> "code": $err:code,
> "description": $err:description,
> "module": $err:module,
> "line-number": $err:line-number,
> "column-number": $err:column-number,
> "line-number-end": $zerr:line-number-end,
> "column-number-end": $zerr:column-number-end
> }
> }
>
> Returns:
>
> {
> "error" : true,
> "code" : "jerr:JNDY0021",
> "description" : "\"<none>\": unexpected JSON token",
> "module" : "/Users/wcandillon/28msec/zorba/build/test.xq",
> "line-number" : 19,
> "column-number" : 17,
> "line-number-end" : 0,
> "column-number-end" : 0
> }
>
> In this scenario, "line-number-end" : 0, "column-number-end" : 0
> doesn't seem to be correct.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/zorba/+bug/1153919/+subscriptions

Ok if this is the expected behavior, perfect then.
Why not having the same end line and col that the start col and line?
That would help consumer of this API such as editors to not that to
deal with this special case, does it make sense?

On Tue, Mar 19, 2013 at 10:00 AM, Paul J. Lucas
<1153919@bugs.launchpad.net> wrote:
> Well, all the documentation says is that it returns the end line number
> or 0 if unset. Can you construct an example where the end line number is
> actually not equal to zero?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1153919
>
> Title:
>   JSON parser doesn't report end line and col
>
> Status in Zorba - The XQuery Processor:
>   New
>
> Bug description:
>   The following query:
>
>       declare namespace err = "http://www.w3.org/2005/xqt-errors";
>       declare namespace zerr = "http://www.zorba-xquery.com/errors";
>
>       let $src := '{
>         "_id" : "511C7C5C9A277C22D138802E",
>         "question_id" : 1777103,
>         "last_edit_date" : "2010-10-16T13:27:10",
>         "creation_date" : "2009-11-21T23:09:55",
>         "last_activity_date" : "2012-11-12T06:29:34",
>         "score" : 248,'
>       return
>         try {
>           jn:parse-json($src)
>         }
>         catch * {
>           {
>             "error": true,
>             "code": $err:code,
>             "description": $err:description,
>             "module": $err:module,
>             "line-number": $err:line-number,
>             "column-number": $err:column-number,
>             "line-number-end": $zerr:line-number-end,
>             "column-number-end": $zerr:column-number-end
>           }
>         }
>
>   Returns:
>
>   {
>     "error" : true,
>     "code" : "jerr:JNDY0021",
>     "description" : "\"<none>\": unexpected JSON token",
>     "module" : "/Users/wcandillon/28msec/zorba/build/test.xq",
>     "line-number" : 19,
>     "column-number" : 17,
>     "line-number-end" : 0,
>     "column-number-end" : 0
>   }
>
>   In this scenario, "line-number-end" : 0, "column-number-end" : 0
>   doesn't seem to be correct.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/zorba/+bug/1153919/+subscriptions

Revision history for this message

Paul J. Lucas (paul-lucas) wrote on 2013-03-19:

#6

I don't know which is correct. I looked at the code, and I don't see how the end line number could get there for XML which is why I asked you to try to construct some (bad) XML in a test case to see what the code does.

If it's always 0, then we could just remove the function rather than always returning the same as the start line.

Dana Florescu (dflorescu) on 2013-04-01

Changed in zorba:
importance:	Undecided → Medium

Paul J. Lucas (paul-lucas) on 2013-07-12

description:

updated

Revision history for this message

Paul J. Lucas (paul-lucas) wrote on 2013-07-12:

#7

I looked at this a bit and it's not clear that adding the end line/column really adds any useful information for the user. For the possible exceptions that can be thrown, most of the time the end line/column would be exactly the same as the start line/column -- either that, or it's difficult to compute what the correct end line/column should be.

* illegal_character: end line/column would always be the same as start line/column.

* illegal_codepoint: the current start line/column points to the character position in the JSON at which the illegal character comprising the codepoint is. For example, given "\uEFG0", the code currently set the start line/column to point at the 'G' since that's the character that's illegal. If you wanted to have a distinct end line/column, then you *could* make start line/column point to the \ (the start of the code-point) and end line/column point to the '0' (the end of the code-point), but, to me, that's actually *less* useful since it doesn't say which character is the illegal one.

* illegal_literal: here, you could actually make a case for having start line/column point to the first character of the literal and end line/column point to the last character of the literal. However, for a literal like:

false_by_itself_without_this_extra_stuff_is_a_valid_json_literal

the code would actually have to continue to scan to the end of the literal to find its end rather than fail immediately upon encountering the '_' which is what happens now. Is that really worth it when start line/column point to the 'f' in "false" now?

* illegal_number: same argument as illegal_literal.

* unterminated_string: even if you gave the end line/column, it *always* would point to the last character in the file.

* illegal_escape: end line/column would always be start line/column+1.

* unexpected_token: the end line/column could be the position of the last character of the token, but this really is not useful information if you have the start line/column.

Hence, I really don't think it's worth doing.

I looked at this a bit and it's not clear that adding the end line/column really adds any useful information for the user.  For the possible exceptions that can be thrown, most of the time the end line/column would be exactly the same as the start line/column -- either that, or it's difficult to compute what the correct end line/column should be.

* illegal_character: end line/column would always be the same as start line/column.

* illegal_codepoint: the current start line/column points to the character position in the JSON at which the illegal character comprising the codepoint is. For example, given "\uEFG0", the code currently set the start line/column to point at the 'G' since that's the character that's illegal. If you wanted to have a distinct end line/column, then you *could* make start line/column point to the \ (the start of the code-point) and end line/column point to the '0' (the end of the code-point), but, to me, that's actually *less* useful since it doesn't say which character is the illegal one.

* illegal_literal: here, you could actually make a case for having start line/column point to the first character of the literal and end line/column point to the last character of the literal. However, for a literal like:

false_by_itself_without_this_extra_stuff_is_a_valid_json_literal

the code would actually have to continue to scan to the end of the literal to find its end rather than fail immediately upon encountering the '_' which is what happens now. Is that really worth it when start line/column point to the 'f' in "false" now?

* illegal_number: same argument as illegal_literal.

* unterminated_string: even if you gave the end line/column, it *always* would point to the last character in the file.

* illegal_escape: end line/column would always be start line/column+1.

* unexpected_token: the end line/column could be the position of the last character of the token, but this really is not useful information if you have the start line/column.

Hence, I really don't think it's worth doing.

Revision history for this message

William Candillon (wcandillon) wrote on 2013-07-12:

#8

I was making this request in order to make it obvious to the user in a IDE. Like here: https://dl.dropboxusercontent.com/u/1487285/Screen%20Shot%202013-07-12%20at%206.46.12%20AM.png

For *unterminated_string, *illegal_escape, *unexpected_token, and *illegal_character it would be great to have it as you described.
But I understand that this is a low importance.
My assumption is that we didn't take end line and line column too seriously because we didn't realize then that these errors could clearly be vizualized.

Revision history for this message

William Candillon (wcandillon) wrote on 2013-07-12:

#9

My reading your comment it looks like we agree on the resolution of this bug and that we simply have to lower its importance/priority.

Revision history for this message

Paul J. Lucas (paul-lucas) wrote on 2013-07-12:

#10

Perhaps you misread my comment, but I do NOT think we agree. I want to close the bug as Won't Fix for the reasons I've outlined. Again, briefly: (1) in all cases, it offers no additional useful information; (2) in some cases, it complicates the lexer/parser in that it has to continue to lex/parse after the error is encountered in order to find the end line/column -- and in some cases, for example, a token that is 400MB long, it may not come for a while and exhaust memory.

Revision history for this message

William Candillon (wcandillon) wrote on 2013-07-12:

#11

But it sounds like end line/end column are not important.
They are the standard in any parser. The use case I showed in the comment above is quite compelling, don't you think?

Couldn't we just use end line and end col to have the same value that start line and start col? That would be a start.

I will look at other JSON parser and let you know.

Revision history for this message

Paul J. Lucas (paul-lucas) wrote on 2013-07-12:

#12

> But it sounds like end line/end column are not important.

That might be because, IMHO, they're not important.

> They are the standard in any parser.

When gcc gives me an error message, it only gives a single line/column. The same is actually true for zorba command-line errors.

> Couldn't we just use end line and end col to have the same value that start line and start col?

Then it provides no additional information, so what's the point?

Revision history for this message

Paul J. Lucas (paul-lucas) wrote on 2013-07-15:

#13

I've tinkered with this a bit more. On the ~paul-lucas/pjl-misc branch, it will now report different start/end line/col for invalid numbers, literals, and string literals, but the end line/col is the one at which the error is first detected (not the end of the literal, for example).

If you're OK with this, I'll propose it for merge.

Revision history for this message

William Candillon (wcandillon) wrote on 2013-07-15:

#14

sound's amazing.

Paul J. Lucas (paul-lucas) on 2013-07-17

Changed in zorba:
status:	New → Fix Committed

Zorba

JSON parser doesn't report end line and col

Bug Description

Other bug subscribers

Remote bug watches