TMUDF: TableInfo::getNumRows() returns wrong count

Bug #1433192 reported by Weishiun Tsai
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Trafodion
Fix Released
High
Hans Zeller

Bug Description

As shown in the following example, TableInfo::getNumRows() returuns -1 while the input table has 5 rows. In the same example, it also shows that getNextRow() loops through the correct number of iteration even though TableInfo::getNumRows() returns an incorrect value.

This is seen on the v0312 build installed on a workstation. To reproduce it:

(1) Download the attached tar file and untar it to get the 3 files in there. Put the files in any directory <mydir>
(2) Make sure that you have run ./sqenv.sh of your Trafodion instance first as building UDF needs $MY_SQROOT for the header files.
(3) Run build.sh from <mydir> to build the UDF so file.
(4) Change the line create library qaTmudfLib file '<mydir>/qaTmudfTest.so'; in mytest.sql and fill in <mydir>
(5) From sqlci, obey mytest.sql

---------------------------------------------------------------------------------

Here is the execution output showing that the program raised a user-defined exception when TableInfo::getNumRows() returned a wrong number.

>>log mytest.log clear;
>>drop schema mytest cascade;

*** ERROR[1003] Schema TRAFODION.MYTEST does not exist.

--- SQL operation failed with errors.
>>create schema mytest;

--- SQL operation complete.
>>set schema mytest;

--- SQL operation complete.
>>
>>create library qaTmudfLib file '<mydir>/qaTmudfTest.so';

--- SQL operation complete.
>>
>>create table mytable (a int, b int);

--- SQL operation complete.
>>insert into mytable values (1,1),(2,2),(3,3),(4,4),(5,5);

--- 5 row(s) inserted.
>>
>>create table_mapping function qaTmudfGeneral()
+>external name 'QA_TMUDF'
+>language cpp
+>library qaTmudfLib;

--- SQL operation complete.
>>
>>select * from UDF(qaTmudfGeneral(TABLE(select * from mytable)));

*** ERROR[11252] Wrong input table row count:
Invalid getNumRows(): expecting 5 got -1
 (SQLSTATE 38001)

*** ERROR[2037] $Z000QRX:79: A message from process $Z000RKV:82 was incorrectly formatted and could not be processed.

*** ERROR[8906] An invalid or corrupt MXUDR reply could not be processed, possibly due to memory corruption in MXUDR while executing user-defined routines or an internal error in SQL.

--- 0 row(s) selected.
>>
>>drop schema mytest cascade;

--- SQL operation complete.
>>
>>exit;

Tags: sql-exe
Revision history for this message
Weishiun Tsai (wei-shiun-tsai) wrote :
Revision history for this message
Hans Zeller (hans-zeller) wrote :

Yes, the histogram statistics and cost parts of the C++ compiler interface are not yet implemented. I'll keep this bug until we support this interface (at least a basic version of it).

Changed in trafodion:
status: New → Confirmed
assignee: nobody → Hans Zeller (hans-zeller)
Revision history for this message
Hans Zeller (hans-zeller) wrote :

Removing the R1.1 milestone target, since I'm not sure we'll support this compiler interface for R1.1.

Changed in trafodion:
milestone: r1.1 → none
Changed in trafodion:
milestone: none → r1.2
Changed in trafodion:
status: Confirmed → In Progress
Revision history for this message
Trafodion-Gerrit (neo-devtools) wrote : Fix proposed to core (master)

Fix proposed to branch: master
Review: https://review.trafodion.org/1717

Revision history for this message
Trafodion-Gerrit (neo-devtools) wrote : Fix merged to core (master)
Download full text (3.4 KiB)

Reviewed: https://review.trafodion.org/1717
Committed: https://github.com/trafodion/core/commit/f99dc5d80d28bce3e193513a0fb528bad3d1d154
Submitter: Trafodion Jenkins
Branch: master

commit f99dc5d80d28bce3e193513a0fb528bad3d1d154
Author: Hans Zeller <email address hidden>
Date: Mon Jun 1 20:54:13 2015 +0000

    Costing and statistics compiler interfaces for UDFs

    blueprint cmp-tmudf-compile-time-interface
    bug 1433192

    This change adds compiler interfaces for UDFs that give information
    about statistics of the result table and also a cost estimate. It also
    has more code for the upcoming Java UDF feature, retrieving updated
    invocation infos and returning them back to the executor/compiler C++
    code.

    Description of the changes in more detail:

    - Addressed remaining review comments from my last checkin,
      https://review.trafodion.org/1655
    - Make sure that user-generated exceptions during deallocation of
      a routine are reported. These happens in the destructor of the
      object derived from tmudr::UDR. For Java, we may need a deallocate
      method.
    - Java and JNI code to serialize the updated UDRInvocationInfo and
      UDRPlanInfo object after calling the user code and return them back
      through the JNI interface to the calling C++ code.
    - The cost method source files had some inline methods defined in
      the .cpp file and used an include file that included other .cpp
      files. Make didn't pick up changes made in these files. Removed
      this code and changed it to regular methods and inlines.
    - Replaced some Context * parameters in costing with PlanWorkSpace *,
      to be able to get to UDF-related info that's stored in a special
      PlanWorkSpace.
    - Changed the behavior or isBigMemoryOperator() for TMUDFs. If the
      UDF writer specifies the DoP for the UDF invocation, then consider
      it a BMO.
    - If possible, synthesize the HASH2 partitioning function of a TMUDF's
      child as the partitioning function of the UDF. This can be done if
      the partitioning key gets passed through the UDF.
    - Statistics interface for TMUDFs:
      - TMUDF now populates statistics field in the UDRInvocationInfo
        object and calls the describeStatistics() method.
      - Added an estimated # of partitions for partitioned input tables
        of TMUDFs. Also changed row count methods to "estimated" row count.
      - Added code to incorporate the information on row count and UEC
        provided by the UDF writer into statistics of the TMUDF. This code
        is not that suitable for coding it as the default implementation
        of describeStatistics(). Therefore, the default implementation of
        describeStatistics() does nothing, but the compiler applies some
        heuristics in case the UDF writer provides no statistics.
    - Changed cost method for TMUDFs to incorporate an estimated cost
      per row from the UDF writer. There is no special compiler interface
      call to ask for the cost, it can be set from the
      describeDesiredDegreeOfParallelism() call and, once supported, from
      the describePlanProperties() call. N...

Read more...

Changed in trafodion:
status: In Progress → Fix Committed
Revision history for this message
Weishiun Tsai (wei-shiun-tsai) wrote :

As part of the delivery, this particular interface no longer exists. New tests will be written to exercise any new interface. This bug report will be closed.

Changed in trafodion:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.