set-based sequence operations

Bug #1058398 reported by Matthias Brantner
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Zorba
Fix Committed
High
Paul J. Lucas

Bug Description

Implement the following set-based sequence function in a new sequence module:

module namespace seq = "http://zorba.io/modules/sequence";

declare function seq:set-intersect($seq1 as xs:anyAtomicType*, $seq2 as xs:anyAtomicType*) as xs:anyAtomicType*
external;

declare function seq:set-union($seq1 as xs:anyAtomicType*, $seq2 as xs:anyAtomicType*) as xs:anyAtomicType* external;

declare function seq:set-except($seq1 as xs:anyAtomicType*, $seq2 as xs:anyAtomicType*) as xs:anyAtomicType* external;

- The function should use value comparison but use false instead of an error in case of a type mismatch.
- JSONiq module

Related branches

Changed in zorba:
importance: Undecided → High
assignee: nobody → Dana Florescu (dflorescu)
milestone: none → 2.9
tags: added: new-functionality-requirement
Changed in zorba:
milestone: 2.9 → 3.0
Revision history for this message
Chris Hillery (ceejatec) wrote :

Assigning this back to you, Matthias, since Dana won't be getting to it in the 3.0 timeframe. Please assess/prioritize/defer along with the rest of your bugs.

Changed in zorba:
assignee: Dana Florescu (dflorescu) → Matthias Brantner (matthias-brantner)
Chris Hillery (ceejatec)
Changed in zorba:
status: New → Confirmed
description: updated
Changed in zorba:
assignee: Matthias Brantner (matthias-brantner) → Paul J. Lucas (paul-lucas)
Revision history for this message
Paul J. Lucas (paul-lucas) wrote :

Do you think this can (should?) be implemented in pure XQuery or in C++?

Revision history for this message
Matthias Brantner (matthias-brantner) wrote : Re: [Bug 1058398] value-based sequence operations

It can be implemented in XQuery but should be implemented in C++ for performance
reasons.

Matthias

On Sep 23, 2013, at 8:19 AM, Paul J. Lucas <email address hidden> wrote:

> Do you think this can (should?) be implemented in pure XQuery or in C++?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1058398
>
> Title:
> value-based sequence operations
>
> Status in Zorba - NoSQL Query Processor:
> Confirmed
>
> Bug description:
> Implement the following value-based sequence function in a new
> sequence module:
>
> module namespace seq = "http://www.zorba-xquery.com/modules/sequence";
>
> declare function seq:value-intersect($seq1 as xs:anyAtomicType*, $seq2 as xs:anyAtomicType*) as xs:anyAtomicType*
> external;
>
> declare function seq:value-union($seq1 as xs:anyAtomicType*, $seq2 as
> xs:anyAtomicType*) as xs:anyAtomicType* external;
>
> declare function seq:value-except($seq1 as xs:anyAtomicType*, $seq2 as
> xs:anyAtomicType*) as xs:anyAtomicType* external;
>
> - The function should use value comparison but use false instead of an error in case of a type mismatch.
> - JSONiq module
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/zorba/+bug/1058398/+subscriptions

Revision history for this message
Paul J. Lucas (paul-lucas) wrote : Re: value-based sequence operations

Should this be a core or non-core module?

Revision history for this message
Matthias Brantner (matthias-brantner) wrote : Re: [Bug 1058398] Re: value-based sequence operations

core

On Sep 23, 2013, at 7:21 PM, "Paul J. Lucas" <email address hidden> wrote:

> Should this be a core or non-core module?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1058398
>
> Title:
> value-based sequence operations
>
> Status in Zorba - NoSQL Query Processor:
> Confirmed
>
> Bug description:
> Implement the following value-based sequence function in a new
> sequence module:
>
> module namespace seq = "http://www.zorba-xquery.com/modules/sequence";
>
> declare function seq:value-intersect($seq1 as xs:anyAtomicType*, $seq2 as xs:anyAtomicType*) as xs:anyAtomicType*
> external;
>
> declare function seq:value-union($seq1 as xs:anyAtomicType*, $seq2 as
> xs:anyAtomicType*) as xs:anyAtomicType* external;
>
> declare function seq:value-except($seq1 as xs:anyAtomicType*, $seq2 as
> xs:anyAtomicType*) as xs:anyAtomicType* external;
>
> - The function should use value comparison but use false instead of an error in case of a type mismatch.
> - JSONiq module
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/zorba/+bug/1058398/+subscriptions

Changed in zorba:
status: Confirmed → In Progress
Revision history for this message
Paul J. Lucas (paul-lucas) wrote : Re: value-based sequence operations

Isn't:

    seq:value-intersect( $s1, $s2 )
    seq:value-union( $s1, $s2 )
    seq:value-except( $s1, $s2 )

the same as:

    distinct-values( $s1[ . = $s2 ] )
    distinct-values( ($s1, $s2) )
    distinct-values( $s1[ not( . = $s2 ) ] )

? The latter are already implemented in C++, so why are the value-* functions proposed here necessary?

Revision history for this message
Matthias Brantner (matthias-brantner) wrote : Re: [Bug 1058398] Re: value-based sequence operations

Correct, but the execution is really slow.
The reason for having them in C++ is purely performance.

Also, general comparison is not available in JSONiq and writing the below
queries is cumbersome.

On Sep 24, 2013, at 9:09 AM, "Paul J. Lucas" <email address hidden> wrote:

> Isn't:
>
> seq:value-intersect( $s1, $s2 )
> seq:value-union( $s1, $s2 )
> seq:value-except( $s1, $s2 )
>
> the same as:
>
> distinct-values( $s1[ . = $s2 ] )
> distinct-values( ($s1, $s2) )
> distinct-values( $s1[ not( . = $s2 ) ] )
>
> ? The latter are already implemented in C++, so why are the value-*
> functions proposed here necessary?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1058398
>
> Title:
> value-based sequence operations
>
> Status in Zorba - NoSQL Query Processor:
> In Progress
>
> Bug description:
> Implement the following value-based sequence function in a new
> sequence module:
>
> module namespace seq = "http://www.zorba-xquery.com/modules/sequence";
>
> declare function seq:value-intersect($seq1 as xs:anyAtomicType*, $seq2 as xs:anyAtomicType*) as xs:anyAtomicType*
> external;
>
> declare function seq:value-union($seq1 as xs:anyAtomicType*, $seq2 as
> xs:anyAtomicType*) as xs:anyAtomicType* external;
>
> declare function seq:value-except($seq1 as xs:anyAtomicType*, $seq2 as
> xs:anyAtomicType*) as xs:anyAtomicType* external;
>
> - The function should use value comparison but use false instead of an error in case of a type mismatch.
> - JSONiq module
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/zorba/+bug/1058398/+subscriptions

Revision history for this message
Paul J. Lucas (paul-lucas) wrote : Re: value-based sequence operations

Should these functions return distinct values? In thinking about it, there's no real reason why they have to. For example:

    let $s1 := (1, 2, 2, 3)
    let $s2 := (2, 3, 4)
    return seq:value-intersect( $s1, $s2 )

COULD return:

    (2, 2, 3)

which would be the same as:

    $s1[ . = $s2 ]

without using distinct-values()

Revision history for this message
Matthias Brantner (matthias-brantner) wrote : Re: [Bug 1058398] value-based sequence operations

Tricky question.

> Should these functions return distinct values? In thinking about it,
> there's no real reason why they have to. For example:
>
> let $s1 := (1, 2, 2, 3)
> let $s2 := (2, 3, 4)
> return seq:value-intersect( $s1, $s2 )
>
> COULD return:
>
> (2, 2, 3)
I would expect (2, 3, 4) because 2 appears only once in s2.
If we want to have this semantics, I don't see how it can be done efficiently.

Maybe we can have two functions:

1. one doing duplicate elimination
2. one not doing duplicate elimination

Maybe a third one could be used if both inputs are ordered.

Maybe we can talk on Skype quickly.

Matthias

>
> which would be the same as:
>
> $s1[ . = $s2 ]
>
> without using distinct-values()
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1058398
>
> Title:
> value-based sequence operations
>
> Status in Zorba - NoSQL Query Processor:
> In Progress
>
> Bug description:
> Implement the following value-based sequence function in a new
> sequence module:
>
> module namespace seq = "http://www.zorba-xquery.com/modules/sequence";
>
> declare function seq:value-intersect($seq1 as xs:anyAtomicType*, $seq2 as xs:anyAtomicType*) as xs:anyAtomicType*
> external;
>
> declare function seq:value-union($seq1 as xs:anyAtomicType*, $seq2 as
> xs:anyAtomicType*) as xs:anyAtomicType* external;
>
> declare function seq:value-except($seq1 as xs:anyAtomicType*, $seq2 as
> xs:anyAtomicType*) as xs:anyAtomicType* external;
>
> - The function should use value comparison but use false instead of an error in case of a type mismatch.
> - JSONiq module
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/zorba/+bug/1058398/+subscriptions

summary: - value-based sequence operations
+ set-based sequence operations
description: updated
Revision history for this message
Paul J. Lucas (paul-lucas) wrote :

FYI: similar to the way the casting code used to be, having this code return false rather than throw an exception for incomparable items will needlessly construct expensive ZorbaException objects. There's currently no way to ask "are Items i and j comparable to each other?".

Revision history for this message
Matthias Brantner (matthias-brantner) wrote : Re: [Bug 1058398] Re: set-based sequence operations

You should look at src/runtime/booleans/BooleanImpl.cpp:738.

This handles general comparison which I believe is the semantics we want to have
if we have heterogeneous sequences.

Matthias

On Sep 25, 2013, at 3:23 PM, "Paul J. Lucas" <email address hidden> wrote:

> FYI: similar to the way the casting code used to be, having this code
> return false rather than throw an exception for incomparable items will
> needlessly construct expensive ZorbaException objects. There's currently
> no way to ask "are Items i and j comparable to each other?".
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1058398
>
> Title:
> set-based sequence operations
>
> Status in Zorba - NoSQL Query Processor:
> In Progress
>
> Bug description:
> Implement the following set-based sequence function in a new sequence
> module:
>
> module namespace seq = "http://zorba.io/modules/sequence";
>
> declare function seq:set-intersect($seq1 as xs:anyAtomicType*, $seq2 as xs:anyAtomicType*) as xs:anyAtomicType*
> external;
>
> declare function seq:set-union($seq1 as xs:anyAtomicType*, $seq2 as
> xs:anyAtomicType*) as xs:anyAtomicType* external;
>
> declare function seq:set-except($seq1 as xs:anyAtomicType*, $seq2 as
> xs:anyAtomicType*) as xs:anyAtomicType* external;
>
> - The function should use value comparison but use false instead of an error in case of a type mismatch.
> - JSONiq module
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/zorba/+bug/1058398/+subscriptions

Revision history for this message
Paul J. Lucas (paul-lucas) wrote :

But that code still can raise an error XPTY0004. Do you want errors raised? If not, then perhaps before trying to compare two items, see if one is a subtype of the other: if not, interpret that to mean "not equal" and never perform the comparison directly.

Revision history for this message
Matthias Brantner (matthias-brantner) wrote :

> But that code still can raise an error XPTY0004. Do you want errors
> raised? If not, then perhaps before trying to compare two items, see if
> one is a subtype of the other: if not, interpret that to mean "not
> equal" and never perform the comparison directly.
I think it should raise an error to be consistent with the general comparison.

>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1058398
>
> Title:
> set-based sequence operations
>
> Status in Zorba - NoSQL Query Processor:
> In Progress
>
> Bug description:
> Implement the following set-based sequence function in a new sequence
> module:
>
> module namespace seq = "http://zorba.io/modules/sequence";
>
> declare function seq:set-intersect($seq1 as xs:anyAtomicType*, $seq2 as xs:anyAtomicType*) as xs:anyAtomicType*
> external;
>
> declare function seq:set-union($seq1 as xs:anyAtomicType*, $seq2 as
> xs:anyAtomicType*) as xs:anyAtomicType* external;
>
> declare function seq:set-except($seq1 as xs:anyAtomicType*, $seq2 as
> xs:anyAtomicType*) as xs:anyAtomicType* external;
>
> - The function should use value comparison but use false instead of an error in case of a type mismatch.
> - JSONiq module
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/zorba/+bug/1058398/+subscriptions

Revision history for this message
Paul J. Lucas (paul-lucas) wrote :

So then when you wrote "The function should use value comparison but use false instead of an error in case of a type mismatch" in the original description, you're now saying you're retracting that requirement?

Revision history for this message
Matthias Brantner (matthias-brantner) wrote :

> So then when you wrote "The function should use value comparison but use
> false instead of an error in case of a type mismatch" in the original
> description, you're now saying you're retracting that requirement?
Hmm, I believe Ghislain wrote that.

I'm CC'ing him here to double check.

Matthias

>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1058398
>
> Title:
> set-based sequence operations
>
> Status in Zorba - NoSQL Query Processor:
> In Progress
>
> Bug description:
> Implement the following set-based sequence function in a new sequence
> module:
>
> module namespace seq = "http://zorba.io/modules/sequence";
>
> declare function seq:set-intersect($seq1 as xs:anyAtomicType*, $seq2 as xs:anyAtomicType*) as xs:anyAtomicType*
> external;
>
> declare function seq:set-union($seq1 as xs:anyAtomicType*, $seq2 as
> xs:anyAtomicType*) as xs:anyAtomicType* external;
>
> declare function seq:set-except($seq1 as xs:anyAtomicType*, $seq2 as
> xs:anyAtomicType*) as xs:anyAtomicType* external;
>
> - The function should use value comparison but use false instead of an error in case of a type mismatch.
> - JSONiq module
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/zorba/+bug/1058398/+subscriptions

Revision history for this message
Paul J. Lucas (paul-lucas) wrote :
Revision history for this message
Ghislain Fourny (gislenius) wrote :

It is my recollection of the discussion we had that we wanted to not throw errors in case of type mismatch, but just consider incompatible values different. I am fine either way and being consistent with general comparison makes sense, too.

Revision history for this message
Paul J. Lucas (paul-lucas) wrote : Re: [Bug 1058398] set-based sequence operations

On Sep 26, 2013, at 2:14 AM, Ghislain Fourny <email address hidden> wrote:

> It is my recollection of the discussion we had that we wanted to not
> throw errors in case of type mismatch, but just consider incompatible
> values different. I am fine either way and being consistent with general
> comparison makes sense, too.

Somebody needs to make a decision.

- Paul

Revision history for this message
Matthias Brantner (matthias-brantner) wrote :

I think we should make it consistent with the functx function which
uses general comparison and, hence, raises an error.

We can add more functions if needed.

Matthias

On Sep 26, 2013, at 7:20 AM, Paul J. Lucas <email address hidden> wrote:

> On Sep 26, 2013, at 2:14 AM, Ghislain Fourny
> <email address hidden> wrote:
>
>> It is my recollection of the discussion we had that we wanted to not
>> throw errors in case of type mismatch, but just consider incompatible
>> values different. I am fine either way and being consistent with general
>> comparison makes sense, too.
>
> Somebody needs to make a decision.
>
> - Paul
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1058398
>
> Title:
> set-based sequence operations
>
> Status in Zorba - NoSQL Query Processor:
> In Progress
>
> Bug description:
> Implement the following set-based sequence function in a new sequence
> module:
>
> module namespace seq = "http://zorba.io/modules/sequence";
>
> declare function seq:set-intersect($seq1 as xs:anyAtomicType*, $seq2 as xs:anyAtomicType*) as xs:anyAtomicType*
> external;
>
> declare function seq:set-union($seq1 as xs:anyAtomicType*, $seq2 as
> xs:anyAtomicType*) as xs:anyAtomicType* external;
>
> declare function seq:set-except($seq1 as xs:anyAtomicType*, $seq2 as
> xs:anyAtomicType*) as xs:anyAtomicType* external;
>
> - The function should use value comparison but use false instead of an error in case of a type mismatch.
> - JSONiq module
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/zorba/+bug/1058398/+subscriptions

Changed in zorba:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.