UDF: UDF query crashes tdm_arkesp with SIGBUS

Bug #1427886 reported by Weishiun Tsai
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Trafodion
Won't Fix
Critical
Suresh Subbiah

Bug Description

In the following example, x1 and x2 are similar queries, except that a predicate in x1

and p_partkey = l_partkey

was replaced with a predicate with UDFs:

and mytest.myudf(p_partkey) = mytest.myudf(l_partkey)

Since myudf() simply returns the same INT input as the output, the 2 queries are also expected to see the same results as well. But as shown here, x1 ran fine while x2 crashed tdm_arkesps. Unfortunately, when this happened, both hs_err_pid<pid>.log files and core files were generated as empty files. The only useful information was the JRE output saying that it was a SIGBUS error that brought down pid 8095 and 8096.

[trafodion@n001 scripts]$ ls -al hs_err*.log
-rw-rw-r-- 1 trafodion trafodion 0 Mar 3 21:02 hs_err_pid8095.log
-rw-rw-r-- 1 trafodion trafodion 0 Mar 3 21:02 hs_err_pid8096.log

[trafodion@n001 501]$ ls -al core*
-rw------- 1 trafodion trafodion 0 Mar 3 21:02 core.1425416566.n001.8095.tdm_arkesp
-rw------- 1 trafodion trafodion 0 Mar 3 21:02 core.1425416566.n001.8096.tdm_arkesp

Notice that the plan for x2 was pretty bad, with a nested_join estimated to have 3E+018 rows. The bad plan may or may not have contributed to the crashing of tdm_arkesps. But in any case, even a bad plan shouldn’t crash tdm_arkesps like this.

This was seen on the v0303 build installed on a 4-node cluster. To reproduce this problem:

(1) Download the attached tar file and untar it to get the 4 files in there. Put the 4 files in any directory <mydir>
(2) Run build.sh from <mydir> to build the UDF so file.
(3) Copy <mydir> to all nodes on the cluster.
(4) Change the line ‘create library qa_udf_lib file '<mydir>/myudf.so'; in mytest.sql and fill in <mydir>
(5) From sqlci, obey mytest.sql

This test requires the QA g_tpch2x global tables. A separate program to populate this set of tables on a cluster is available upon request.

----------------------------------------------------------------------------------------------------------

Here is the execution output:

>>log mytest.log clear;
>>
>>create schema mytest;

--- SQL operation complete.
>>set schema mytest;

--- SQL operation complete.
>>
>>create library qa_udf_lib file '/opt/home/trafodion/udf_bug/myudf.so';

--- SQL operation complete.
>>
>>create function myudf
+>(INVAL int)
+>returns (OUTVAL int)
+>language c
+>parameter style sql
+>external name 'myudf'
+>library qa_udf_lib
+>deterministic
+>state area size 1024
+>allow any parallelism
+>no sql;

--- SQL operation complete.
>>
>>set schema g_tpch2x;

--- SQL operation complete.
>>
>>showddl part;

CREATE TABLE TRAFODION.G_TPCH2X.PART
  (
    P_PARTKEY INT NO DEFAULT NOT NULL NOT DROPPABLE
  , P_NAME VARCHAR(55) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , P_MFGR CHAR(25) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , P_BRAND CHAR(10) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , P_TYPE VARCHAR(25) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , P_SIZE INT NO DEFAULT NOT NULL NOT DROPPABLE
  , P_CONTAINER CHAR(10) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , P_RETAILPRICE NUMERIC(12, 2) NO DEFAULT NOT NULL NOT
      DROPPABLE
  , P_COMMENT VARCHAR(23) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , PRIMARY KEY (P_PARTKEY ASC)
  )
  SALT USING 8 PARTITIONS
;

--- SQL operation complete.
>>showddl supplier;

CREATE TABLE TRAFODION.G_TPCH2X.SUPPLIER
  (
    S_SUPPKEY INT NO DEFAULT NOT NULL NOT DROPPABLE
  , S_NAME CHAR(25) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , S_ADDRESS VARCHAR(40) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , S_NATIONKEY INT NO DEFAULT NOT NULL NOT DROPPABLE
  , S_PHONE CHAR(15) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , S_ACCTBAL NUMERIC(12, 2) NO DEFAULT NOT NULL NOT
      DROPPABLE
  , S_COMMENT VARCHAR(101) CHARACTER SET ISO88591
      COLLATE DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , PRIMARY KEY (S_SUPPKEY ASC)
  )
  SALT USING 8 PARTITIONS
;

--- SQL operation complete.
>>showddl lineitem;

CREATE TABLE TRAFODION.G_TPCH2X.LINEITEM
  (
    L_ORDERKEY INT NO DEFAULT NOT NULL NOT DROPPABLE
  , L_PARTKEY INT NO DEFAULT NOT NULL NOT DROPPABLE
  , L_SUPPKEY INT NO DEFAULT NOT NULL NOT DROPPABLE
  , L_LINENUMBER INT NO DEFAULT NOT NULL NOT DROPPABLE
  , L_QUANTITY NUMERIC(12, 2) NO DEFAULT NOT NULL NOT
      DROPPABLE
  , L_EXTENDEDPRICE NUMERIC(12, 2) NO DEFAULT NOT NULL NOT
      DROPPABLE
  , L_DISCOUNT NUMERIC(12, 2) NO DEFAULT NOT NULL NOT
      DROPPABLE
  , L_TAX NUMERIC(12, 2) NO DEFAULT NOT NULL NOT
      DROPPABLE
  , L_RETURNFLAG CHAR(1) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , L_LINESTATUS CHAR(1) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , L_SHIPDATE DATE NO DEFAULT NOT NULL NOT DROPPABLE
  , L_COMMITDATE DATE NO DEFAULT NOT NULL NOT DROPPABLE
  , L_RECEIPTDATE DATE NO DEFAULT NOT NULL NOT DROPPABLE
  , L_SHIPINSTRUCT CHAR(25) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , L_SHIPMODE CHAR(10) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , L_COMMENT VARCHAR(44) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , PRIMARY KEY (L_SHIPDATE ASC, L_ORDERKEY ASC, L_LINENUMBER ASC)
  )
  SALT USING 8 PARTITIONS
;

--- SQL operation complete.
>>showddl partsupp;

CREATE TABLE TRAFODION.G_TPCH2X.PARTSUPP
  (
    PS_PARTKEY INT NO DEFAULT NOT NULL NOT DROPPABLE
  , PS_SUPPKEY INT NO DEFAULT NOT NULL NOT DROPPABLE
  , PS_AVAILQTY INT NO DEFAULT NOT NULL NOT DROPPABLE
  , PS_SUPPLYCOST NUMERIC(12, 2) NO DEFAULT NOT NULL NOT
      DROPPABLE
  , PS_COMMENT VARCHAR(199) CHARACTER SET ISO88591
      COLLATE DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , PRIMARY KEY (PS_PARTKEY ASC, PS_SUPPKEY ASC)
  )
  SALT USING 8 PARTITIONS
;

--- SQL operation complete.
>>showddl orders;

CREATE TABLE TRAFODION.G_TPCH2X.ORDERS
  (
    O_ORDERKEY INT NO DEFAULT NOT NULL NOT DROPPABLE
  , O_CUSTKEY INT NO DEFAULT NOT NULL NOT DROPPABLE
  , O_ORDERSTATUS CHAR(1) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , O_TOTALPRICE NUMERIC(12, 2) NO DEFAULT NOT NULL NOT
      DROPPABLE
  , O_ORDERDATE DATE NO DEFAULT NOT NULL NOT DROPPABLE
  , O_ORDERPRIORITY CHAR(15) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , O_CLERK CHAR(15) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , O_SHIPPRIORITY INT NO DEFAULT NOT NULL NOT DROPPABLE
  , O_COMMENT VARCHAR(79) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , PRIMARY KEY (O_ORDERKEY ASC)
  )
  SALT USING 8 PARTITIONS
;

--- SQL operation complete.
>>showddl nation;

CREATE TABLE TRAFODION.G_TPCH2X.NATION
  (
    N_NATIONKEY INT NO DEFAULT NOT NULL NOT DROPPABLE
  , N_NAME CHAR(25) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , N_REGIONKEY INT NO DEFAULT NOT NULL NOT DROPPABLE
  , N_COMMENT VARCHAR(152) CHARACTER SET ISO88591
      COLLATE DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , PRIMARY KEY (N_NATIONKEY ASC)
  )
  SALT USING 8 PARTITIONS
;

--- SQL operation complete.
>>
>>select count(*) from part;

(EXPR)
--------------------

              400000

--- 1 row(s) selected.
>>select count(*) from supplier;

(EXPR)
--------------------

               20000

--- 1 row(s) selected.
>>select count(*) from lineitem;

(EXPR)
--------------------

            11997996

--- 1 row(s) selected.
>>select count(*) from partsupp;

(EXPR)
--------------------

             1600000

--- 1 row(s) selected.
>>select count(*) from orders;

(EXPR)
--------------------

             3000000

--- 1 row(s) selected.
>>select count(*) from nation;

(EXPR)
--------------------

                  25

--- 1 row(s) selected.
>>
>>prepare x1 from select
+>nation,
+>o_year,
+>cast(sum(amount) as numeric(18,2)) as sum_profit
+>from
+>(
+>select
+>n_name as nation,
+>extract(year from o_orderdate) as o_year,
+>l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
+>from
+>part,
+>supplier,
+>lineitem,
+>partsupp,
+>orders,
+>nation
+>where
+>s_suppkey = l_suppkey
+>and ps_suppkey = l_suppkey
+>and ps_partkey = l_partkey
+>and p_partkey = l_partkey
+>and o_orderkey = l_orderkey
+>and s_nationkey = n_nationkey
+>and p_name like '%maroon%'
+>) as profit
+>group by
+>nation,
+>o_year
+>order by
+>nation,
+>o_year desc;

*** WARNING[6007] Multi-column statistics for columns (L_PARTKEY, L_SUPPKEY) from table TRAFODION.G_TPCH2X.LINEITEM were not available. The columns were being used by Join operator. As a result, the access path chosen might not be the best possible.

*** WARNING[6007] Multi-column statistics for columns (PS_PARTKEY, PS_SUPPKEY) from table TRAFODION.G_TPCH2X.PARTSUPP were not available. The columns were being used by Join operator. As a result, the access path chosen might not be the best possible.

--- SQL command prepared.
>>
>>explain options 'f' x1;

LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------------------- -------- -------------------- ---------

20 . 21 root 1.75E+002
19 . 20 sort_partial_groupby 1.75E+002
18 . 19 esp_exchange 1:8(hash2) (m) 1.75E+002
17 . 18 sort 1.75E+002
16 . 17 hash_partial_groupby 1.75E+002
15 2 16 hybrid_hash_join 1.00E+005
14 4 15 hybrid_hash_join 1.00E+005
13 5 14 hybrid_hash_join 1.00E+005
12 . 13 esp_exchange 8(hash2):8(hash2) 1.00E+005
11 10 12 hybrid_hash_join 1.00E+005
. . 11 trafodion_scan LINEITEM 1.27E+007
9 . 10 esp_exchange 8(rep-b):8(hash2) 1.00E+005
8 7 9 hybrid_hash_join 1.00E+005
. . 8 trafodion_scan PARTSUPP 1.60E+006
6 . 7 esp_exchange 8(rep-b):8(hash2) 2.50E+004
. . 6 trafodion_scan PART 2.50E+004
. . 5 trafodion_scan ORDERS 3.00E+006
3 . 4 esp_exchange 8(rep-b):8(hash2) 2.00E+004
. . 3 trafodion_scan SUPPLIER 2.00E+004
1 . 2 esp_exchange 8(rep-b):8(hash2) 2.50E+001
. . 1 trafodion_scan NATION 2.50E+001

--- SQL operation complete.
>>execute x1;

NATION O_YEAR SUM_PROFIT
------------------------- ------ ---------------------

ALGERIA 1998 51783642.18
ALGERIA 1997 86286203.65
ALGERIA 1996 87097920.83
ALGERIA 1995 86882928.07
ALGERIA 1994 85966421.99
ALGERIA 1993 87074609.01
ALGERIA 1992 87927441.52
ARGENTINA 1998 55435935.23
ARGENTINA 1997 91774387.06
ARGENTINA 1996 90289325.47
ARGENTINA 1995 93665839.59
ARGENTINA 1994 94067595.27
ARGENTINA 1993 94091127.06
ARGENTINA 1992 94195502.45
BRAZIL 1998 53806942.52
BRAZIL 1997 88435716.63
BRAZIL 1996 85985843.86
BRAZIL 1995 90576370.23
BRAZIL 1994 89426585.52
BRAZIL 1993 88119304.16
BRAZIL 1992 89863038.36
CANADA 1998 55469958.60
CANADA 1997 93278053.35
CANADA 1996 91813409.66
CANADA 1995 93193957.62
CANADA 1994 93222855.11
CANADA 1993 92414219.35
CANADA 1992 94828697.21
CHINA 1998 50009112.60
CHINA 1997 89948012.81
CHINA 1996 91517724.00
CHINA 1995 87325462.54
CHINA 1994 89179962.33
CHINA 1993 91044221.76
CHINA 1992 89726739.83
EGYPT 1998 50048836.77
EGYPT 1997 87961101.49
EGYPT 1996 90615331.48
EGYPT 1995 86777608.66
EGYPT 1994 89648957.43
EGYPT 1993 89545217.40
EGYPT 1992 93588075.32
ETHIOPIA 1998 51627560.00
ETHIOPIA 1997 83501023.14
ETHIOPIA 1996 89358240.42
ETHIOPIA 1995 87303959.51
ETHIOPIA 1994 85819736.98
ETHIOPIA 1993 86368865.17
ETHIOPIA 1992 84230624.79
FRANCE 1998 53240880.80
FRANCE 1997 85417252.01
FRANCE 1996 88182913.31
FRANCE 1995 85977503.35
FRANCE 1994 86710296.33
FRANCE 1993 86075551.33
FRANCE 1992 88799415.97
GERMANY 1998 55042923.27
GERMANY 1997 93270702.21
GERMANY 1996 93615371.10
GERMANY 1995 91347562.11
GERMANY 1994 93542693.21
GERMANY 1993 90957008.14
GERMANY 1992 92624874.87
INDIA 1998 52125987.99
INDIA 1997 86331615.57
INDIA 1996 86708182.86
INDIA 1995 88455426.29
INDIA 1994 90342313.87
INDIA 1993 88743091.24
INDIA 1992 86633888.22
INDONESIA 1998 57417719.18
INDONESIA 1997 97448707.21
INDONESIA 1996 96031488.99
INDONESIA 1995 99980388.24
INDONESIA 1994 96218426.32
INDONESIA 1993 94473301.99
INDONESIA 1992 98194754.06
IRAN 1998 53321225.16
IRAN 1997 92668072.69
IRAN 1996 92431464.41
IRAN 1995 89544205.41
IRAN 1994 91517876.66
IRAN 1993 92899311.35
IRAN 1992 94407141.41
IRAQ 1998 55051293.89
IRAQ 1997 92699880.14
IRAQ 1996 94377557.56
IRAQ 1995 95685890.62
IRAQ 1994 92444139.75
IRAQ 1993 94387246.25
IRAQ 1992 93445942.71
JAPAN 1998 56725794.57
JAPAN 1997 93157954.26
JAPAN 1996 97141631.19
JAPAN 1995 94662310.77
JAPAN 1994 92171492.10
JAPAN 1993 95411251.83
JAPAN 1992 93475927.87
JORDAN 1998 58003200.23
JORDAN 1997 94521826.68
JORDAN 1996 95773487.21
JORDAN 1995 94413784.66
JORDAN 1994 94709353.53
JORDAN 1993 95670937.89
JORDAN 1992 98699878.40
KENYA 1998 48883759.27
KENYA 1997 81504618.97
KENYA 1996 82114383.00
KENYA 1995 83439911.26
KENYA 1994 83561206.58
KENYA 1993 84634159.99
KENYA 1992 82694718.17
MOROCCO 1998 50970709.29
MOROCCO 1997 86819448.48
MOROCCO 1996 86923222.15
MOROCCO 1995 87698174.39
MOROCCO 1994 83034306.27
MOROCCO 1993 86061193.88
MOROCCO 1992 86877626.95
MOZAMBIQUE 1998 55306558.55
MOZAMBIQUE 1997 93868982.94
MOZAMBIQUE 1996 88605463.05
MOZAMBIQUE 1995 90759974.50
MOZAMBIQUE 1994 92650101.71
MOZAMBIQUE 1993 93335380.94
MOZAMBIQUE 1992 92057711.09
PERU 1998 53847616.24
PERU 1997 93691793.37
PERU 1996 94726086.22
PERU 1995 94005136.19
PERU 1994 94034552.25
PERU 1993 91437995.75
PERU 1992 94641870.89
ROMANIA 1998 52972922.10
ROMANIA 1997 85815991.62
ROMANIA 1996 91900806.97
ROMANIA 1995 84648451.61
ROMANIA 1994 87105801.78
ROMANIA 1993 86028962.79
ROMANIA 1992 89308515.63
RUSSIA 1998 55978156.33
RUSSIA 1997 95059403.29
RUSSIA 1996 90764583.38
RUSSIA 1995 89425367.45
RUSSIA 1994 91887367.05
RUSSIA 1993 91869185.46
RUSSIA 1992 92846597.00
SAUDI ARABIA 1998 53331361.46
SAUDI ARABIA 1997 96741944.93
SAUDI ARABIA 1996 93032355.57
SAUDI ARABIA 1995 93803320.20
SAUDI ARABIA 1994 96144382.02
SAUDI ARABIA 1993 95271436.72
SAUDI ARABIA 1992 92725077.54
UNITED KINGDOM 1998 50753170.93
UNITED KINGDOM 1997 87019576.17
UNITED KINGDOM 1996 88505904.65
UNITED KINGDOM 1995 86581607.32
UNITED KINGDOM 1994 87741002.36
UNITED KINGDOM 1993 90547372.64
UNITED KINGDOM 1992 87386224.52
UNITED STATES 1998 56130427.30
UNITED STATES 1997 91382625.48
UNITED STATES 1996 94330918.13
UNITED STATES 1995 90053966.60
UNITED STATES 1994 92703268.78
UNITED STATES 1993 89110915.03
UNITED STATES 1992 91498120.29
VIETNAM 1998 50471850.47
VIETNAM 1997 85190208.09
VIETNAM 1996 85946908.61
VIETNAM 1995 87509444.01
VIETNAM 1994 86800591.58
VIETNAM 1993 84489235.18
VIETNAM 1992 84345673.55

--- 175 row(s) selected.
>>
>>prepare x2 from select
+>nation,
+>o_year,
+>cast(sum(amount) as numeric(18,2)) as sum_profit
+>from
+>(
+>select
+>n_name as nation,
+>extract(year from o_orderdate) as o_year,
+>l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
+>from
+>part,
+>supplier,
+>lineitem,
+>partsupp,
+>orders,
+>nation
+>where
+>s_suppkey = l_suppkey
+>and ps_suppkey = l_suppkey
+>and ps_partkey = l_partkey
+>and mytest.myudf(p_partkey) = mytest.myudf(l_partkey)
+>and o_orderkey = l_orderkey
+>and s_nationkey = n_nationkey
+>and p_name like '%maroon%'
+>) as profit
+>group by
+>nation,
+>o_year
+>order by
+>nation,
+>o_year desc;

*** WARNING[6007] Multi-column statistics for columns (L_PARTKEY, L_SUPPKEY) from table TRAFODION.G_TPCH2X.LINEITEM were not available. The columns were being used by Join operator. As a result, the access path chosen might not be the best possible.

*** WARNING[6007] Multi-column statistics for columns (PS_PARTKEY, PS_SUPPKEY) from table TRAFODION.G_TPCH2X.PARTSUPP were not available. The columns were being used by Join operator. As a result, the access path chosen might not be the best possible.

--- SQL command prepared.
>>
>>explain options 'f' x2;

LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------------------- -------- -------------------- ---------

28 . 29 root 1.75E+002
27 . 28 sort 1.75E+002
26 . 27 esp_exchange 1:8(hash2) 1.75E+002
25 . 26 hash_partial_groupby 1.75E+002
24 . 25 esp_exchange 8(hash2):8(hash2) 1.75E+002
23 . 24 sort_partial_groupby 1.75E+002
20 22 23 nested_join 3.00E+018
21 . 22 probe_cache 7.50E+007
. . 21 isolated_scalar_udf MYUDF 7.50E+007
19 . 20 sort 4.00E+010
18 2 19 hybrid_hash_join 4.00E+010
15 17 18 nested_join 1.60E+006
16 . 17 probe_cache 1.00E+000
. . 16 isolated_scalar_udf MYUDF 1.00E+000
14 4 15 hybrid_hash_join 1.60E+006
13 6 14 hybrid_hash_join 1.60E+006
12 7 13 hybrid_hash_join 1.60E+006
11 . 12 esp_exchange 8(hash2):8(hash2) 1.60E+006
10 8 11 hybrid_hash_join 1.60E+006
9 . 10 esp_exchange 8(hash2):8(hash2) 1.27E+007
. . 9 trafodion_scan LINEITEM 1.27E+007
. . 8 trafodion_scan PARTSUPP 1.60E+006
. . 7 trafodion_scan ORDERS 3.00E+006
5 . 6 esp_exchange 8(rep-b):8(hash2) 2.00E+004
. . 5 trafodion_scan SUPPLIER 2.00E+004
3 . 4 esp_exchange 8(rep-b):8(hash2) 2.50E+001
. . 3 trafodion_scan NATION 2.50E+001
1 . 2 esp_exchange 8(rep-b):8(hash2) 2.50E+004
. . 1 trafodion_scan PART 2.50E+004

--- SQL operation complete.
>>execute x2;
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGBUS (0x7) at pc=0x00007ffff2929ad0, pid=8095, tid=140737353869664
#
# JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build 1.7.0_67-b01)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# #
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGBUS (0x7) at pc=0x00007ffff2929ad0, pid=8096, tid=140737353869664
#
# JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build 1.7.0_67-b01)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [libc.so.6+0x89ad0]C [libc.so.6+0x89ad0] memcpy+0x2b0
#
# Core dump written. Default location: /opt/home/trafodion/v0303/sql/scripts/core or core.8095
#
  memcpy+0x2b0
#
# Core dump written. Default location: /opt/home/trafodion/v0303/sql/scripts/core or core.8096
#
# An error report file with more information is saved as:
# /opt/home/trafodion/v0303/sql/scripts/hs_err_pid8095.log
# An error report file with more information is saved as:
# /opt/home/trafodion/v0303/sql/scripts/hs_err_pid8096.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
#
# If you would like to submit a bug report, please visit:
# http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

*** ERROR[2034] $Z00050F:64: Operating system error 201 while communicating with server process $Z0006LA:71.

*** ERROR[2034] $Z00050F:64: Operating system error 201 while communicating with server process $Z0006LA:71.

*** ERROR[2034] $Z00050F:64: Operating system error 201 while communicating with server process $Z0006LA:71.

*** ERROR[2034] $Z00050F:64: Operating system error 201 while communicating with server process $Z0006LA:71.

*** ERROR[2034] $Z00050F:64: Operating system error 201 while communicating with server process $Z0006LA:71.

*** ERROR[2034] $Z00050F:64: Operating system error 201 while communicating with server process $Z0006LA:71.

*** ERROR[2034] $Z00050F:64: Operating system error 201 while communicating with server process $Z0006LA:71.

--- 0 row(s) selected.
>>
>>drop function mytest.myudf cascade;

--- SQL operation complete.
>>drop library mytest.qa_udf_lib cascade;

--- SQL operation complete.
>>drop schema mytest cascade;

--- SQL operation complete.

Tags: sql-exe
Revision history for this message
Weishiun Tsai (wei-shiun-tsai) wrote :
Changed in trafodion:
assignee: nobody → Suresh Subbiah (suresh-subbiah)
status: New → In Progress
Revision history for this message
Suresh Subbiah (suresh-subbiah) wrote :
Download full text (10.5 KiB)

This error is occurring because we are running out of disk space for overflow files. On a workstation this query runs for 2 hrs and gives the same error with core files that have 0 bytes. A few seconds before the error we have 634 * 2GB = 1268 GB of overflow files

lsof +L1 | grep SCR | wc -l
634

df shows that we are almost out of disk space
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg01-lvol1 1.9T 1.8T 1.6G 100% /opt/home

the query has 2 big overflowing operators. We are about 50% through the first one, node 42 in the stats report below. (17 Billion rows out of an expected 40 billion).

The core file can be avoided if we use the cqd OVERFLOW_MODE 'DISK'. By default we use memory mapped IO for overflow. When overflow runs out of disk space it generates a core file with memcpy or something similar in the trace as shown here. I think that is unavoidable.

One could say that the compiler could generate a better plan for this query and possibly evaluate the UDFs before the big join. That would be a wishlist item. I would like to close this bug report or change it to a wishlist against the UDF plan quality.

Thanks.

get statistics for qid MXID11000020789212294050944156777000000000206U3333300_892_X2 default ;
..
Qid MXID11000020789212294050944156777000000000206U3333300_892_X2
Compile Start Time 2015/03/25 13:43:15.160008
Compile End Time 2015/03/25 13:43:16.931895
Compile Elapsed Time 0:00:01.771887
Execute Start Time 2015/03/25 13:43:16.969590
Execute End Time -1
Execute Elapsed Time 1:59:46.852797
State OPEN
Rows Affected -1
SQL Error Code 0
Stats Error Code 0
Query Type SQL_SELECT_NON_UNIQUE
Sub Query Type SQL_STMT_NA
Estimated Accessed Rows 0
Estimated Used Rows 0
Parent Qid NONE
Parent Query System NONE
Child Qid NONE
Number of SQL Processes 40
Number of Cpus 2
Transaction Id -1
Source String select nation, o_year, cast(sum(amount) as numeric(18,2)) as sum_profit from ( select n_name as nation, extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from part, supplier, lineitem, par
SQL Source Length 562
Rows Returned 0
First Row Returned Time -1
Last Error before AQR 0
Number of AQR retries 0
Delay before AQR 0
No. of times reclaimed 0
Cancel Time -1
Last Suspend Time -1
Stats Collection Type OPERATOR_STATS

LC RC Id PaId ExId Frag TDB Name DOP Dispatches Oper CPU Time Est. Records Used Act. Records Used Details

57 . 58 . 31 0 EX_ROOT 1 1 52 0 0 963
56 . 57 58 30 0 EX_SORT 0 1 4 175 0 0|0|0
55 . 56 57 29 0 EX_SPLIT_TOP 1 1 14 175 0
54 ....

Revision history for this message
Suresh Subbiah (suresh-subbiah) wrote :

A fix by Prashanth for https://bugs.launchpad.net/trafodion/+bug/1389784 may address the problem of poor error reporting by overflow operators when MMIO is used. With that fix we now get this error.
*** ERROR[10005] Sort encountered an unexpected error. Details: SQScratchFile::SQScratchFile(5) System Error 28. Error detail 0

Maybe this case can be kept open with lower priority to track the plan quality issue?

Changed in trafodion:
status: In Progress → Fix Committed
Revision history for this message
Weishiun Tsai (wei-shiun-tsai) wrote :
Download full text (4.3 KiB)

Tried it on the v0519 build installed on a 6-node cluster. This query now sees the 10005 error. With 2 UDFs on important keys, the query plan generated is pretty bad with 2 nested joins. Such a 10005 error is likely due to the bad query plan. If a customer runs into this problem, development will likely have to help with the query plan by tuning CQDs. For now, this case will be closed.

>>prepare x2 from select
+>nation,
+>o_year,
+>cast(sum(amount) as numeric(18,2)) as sum_profit
+>from
+>(
+>select
+>n_name as nation,
+>extract(year from o_orderdate) as o_year,
+>l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
+>from
+>part,
+>supplier,
+>lineitem,
+>partsupp,
+>orders,
+>nation
+>where
+>s_suppkey = l_suppkey
+>and ps_suppkey = l_suppkey
+>and ps_partkey = l_partkey
+>and mytest.myudf(p_partkey) = mytest.myudf(l_partkey)
+>and o_orderkey = l_orderkey
+>and s_nationkey = n_nationkey
+>and p_name like '%maroon%'
+>) as profit
+>group by
+>nation,
+>o_year
+>order by
+>nation,
+>o_year desc;

*** WARNING[6007] Multi-column statistics for columns (L_PARTKEY, L_SUPPKEY) from table TRAFODION.G_TPCH2X.LINEITEM were not available. The columns were being used by Join operator. As a result, the access path chosen might not be the best possible.

*** WARNING[6007] Multi-column statistics for columns (PS_PARTKEY, PS_SUPPKEY) from table TRAFODION.G_TPCH2X.PARTSUPP were not available. The columns were being used by Join operator. As a result, the access path chosen might not be the best possible.

--- SQL command prepared.
>>
>>explain options 'f' x2;

LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------------------- -------- -------------------- ---------

30 . 31 root 1.75E+002
29 . 30 sort 1.75E+002
28 . 29 esp_exchange 1:12(hash2) 1.75E+002
27 . 28 hash_partial_groupby 1.75E+002
26 . 27 esp_exchange 12(hash2):12(hash2) 1.75E+002
25 . 26 sort_partial_groupby 1.75E+002
22 24 25 nested_join 3.00E+018
23 . 24 probe_cache 7.50E+007
. . 23 isolated_scalar_udf MYUDF 7.50E+007
21 . 22 sort 4.00E+010
20 2 21 hybrid_hash_join 4.00E+010
17 19 20 nested_join 1.60E+006
18 . 19 probe_cache 1.00E+000
. . 18 isolated_scalar_udf MYUDF 1.00E+000
16 4 17 hybrid_hash_join 1.60E+006
15 6 16 hybrid_hash_join 1.60E+006
14 8 15 hybrid_hash_join 1.60E+006
13 . 14 esp_exchange 12(hash2):12(hash2) 1.60E+006
12 10 13 hybrid_...

Read more...

Changed in trafodion:
status: Fix Committed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.