Upsert into a table with index hangs and hogs system resource

Bug #1417337 reported by Weishiun Tsai
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Trafodion
Won't Fix
Critical
Joanie Cooper

Bug Description

As shown here, 2 tables: t1 and t2 are identical, except that t2 has an index i2 created for it. 2 identical ‘upsert using load’ queries x1 and x2 are then prepared for t1 and t2 respectively. The query upserts 1000000 rows. Plan x2 includes 2 nested_joins and a trafodion_insert compared to plan x1’s trafodion_load. On either a Hortonworks machine or a Cloudera machine, x1 can generally complete in around 30 seconds on our 4-node systems, but x2 always ‘hangs’. In the latest try, it hanged for 4 hours without coming back. At that point, one of the hbase region servers has used 9.9g virtual memory, 6.6g resident memory, and the CUP usage hovered around 600%-700%. Somewhere along the way, it also started to have trouble with monitors with messages like the followings shown in $SQ_HOME/logs/mon.xxx.log:

02/02/2015-22:52:04.796.430: (name=monitor/pid=24127/tid=24131) (cmp=1/ev=101130801/fac=320/sev=6): STDERR redirected from n001.$Z000HNQ.0.21655: [$Z000HNQ] Exiting
02/02/2015-22:52:04.896.247: (name=monitor/pid=24127/tid=24131) (cmp=1/ev=101130801/fac=320/sev=6): STDERR redirected from n001.$Z000HQM.0.21722: [$Z000HQM] Exiting
02/03/2015-00:05:05.891.488: (name=monitor/pid=24127/tid=24131) (cmp=1/ev=101130801/fac=320/sev=6): STDERR redirected from n001.$Z0005V1.0.7176: [$Z0005V1] Exiting
02/03/2015-00:05:05.989.256: (name=monitor/pid=24127/tid=24131) (cmp=1/ev=101130801/fac=320/sev=6): STDERR redirected from n001.$Z0005XF.0.7260: [$Z0005XF] Exiting

This is the kind of Trafodion hardening issue regarding overall usability/stability for post-1.0 releases, so the bug report is filed as a critical one. The problem is fairly reproducible on our systems using the v1.0.0 rc3 build. It requires the QA global table g_wisc32.abase, which has 32000000 rows.

----------------------------------------------

Here is the entire script to reproduce it (it requires the QA global table g_wisc32.abase):

log mytest.log clear;
drop schema mytest9 cascade;
create schema mytest9;
set schema mytest9;
showddl trafodion.g_wisc32.abase;
showstats for table trafodion.g_wisc32.abase on every column;

create table t1
  (
    UNIQUE1 INT NO DEFAULT NOT NULL NOT DROPPABLE
  , UNIQUE2 INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TWO INT NO DEFAULT NOT NULL NOT DROPPABLE
  , FOUR INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TEN INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TWENTY INT NO DEFAULT NOT NULL NOT DROPPABLE
  , ONEPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TENPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TWENTYPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , FIFTYPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , UNIQUE3 INT NO DEFAULT NOT NULL NOT DROPPABLE
  , EVENONEPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , ODDONEPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , STRINGU1 CHAR(52) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , STRINGU2 CHAR(52) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , STRING4 CHAR(52) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , PRIMARY KEY (UNIQUE2 ASC)
  )
;

create table t2
  (
    UNIQUE1 INT NO DEFAULT NOT NULL NOT DROPPABLE
  , UNIQUE2 INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TWO INT NO DEFAULT NOT NULL NOT DROPPABLE
  , FOUR INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TEN INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TWENTY INT NO DEFAULT NOT NULL NOT DROPPABLE
  , ONEPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TENPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TWENTYPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , FIFTYPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , UNIQUE3 INT NO DEFAULT NOT NULL NOT DROPPABLE
  , EVENONEPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , ODDONEPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , STRINGU1 CHAR(52) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , STRINGU2 CHAR(52) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , STRING4 CHAR(52) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , PRIMARY KEY (UNIQUE2 ASC)
  )
;

create index i2 on t2 (two, four, unique1, unique3);

showddl t1;
showddl t2;

select count(*) from trafodion.g_wisc32.abase;
select count(*) from trafodion.g_wisc32.abase where unique2 < 1000000;

prepare x1 from upsert using load into t1 select * from trafodion.g_wisc32.abase where unique2 < 1000000;
prepare x2 from upsert using load into t2 select * from trafodion.g_wisc32.abase where unique2 < 1000000;
explain options 'f' x1;
explain options 'f' x2;
explain x1;
explain x2;

sh date;
execute x1;
sh date;
drop table t1 cascade;
sh date;
execute x2;
sh date;
drop table t2 cascade;
sh date;
drop schema mytest9;

----------------------------------------------

Here is the execution output up to the point when ‘execute x2’ hanged for 4 hours:

>>drop schema mytest9 cascade;

*** ERROR[1003] Schema TRAFODION.MYTEST9 does not exist.

--- SQL operation failed with errors.
>>create schema mytest9;

--- SQL operation complete.
>>set schema mytest9;

--- SQL operation complete.
>>showddl trafodion.g_wisc32.abase;

CREATE TABLE TRAFODION.G_WISC32.ABASE
  (
    UNIQUE1 INT NO DEFAULT NOT NULL NOT DROPPABLE
  , UNIQUE2 INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TWO INT NO DEFAULT NOT NULL NOT DROPPABLE
  , FOUR INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TEN INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TWENTY INT NO DEFAULT NOT NULL NOT DROPPABLE
  , ONEPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
, TENPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TWENTYPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , FIFTYPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , UNIQUE3 INT NO DEFAULT NOT NULL NOT DROPPABLE
  , EVENONEPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , ODDONEPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , STRINGU1 CHAR(52) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , STRINGU2 CHAR(52) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , STRING4 CHAR(52) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , PRIMARY KEY (UNIQUE2 ASC)
  )
  SALT USING 8 PARTITIONS
;

CREATE UNIQUE INDEX IXA4 ON TRAFODION.G_WISC32.ABASE
  (
    UNIQUE3 ASC
  , TWENTYPERCENT ASC
  , FIFTYPERCENT ASC
  )
;

--- SQL operation complete.
>>showstats for table trafodion.g_wisc32.abase on every column;

Histogram data for Table TRAFODION.G_WISC32.ABASE
Table ID: 533733601026556210

   Hist ID # Ints Rowcount UEC Colname(s)
========== ====== =========== =========== ===========================
2094748496 38 32334424 32334424 UNIQUE1
2094748491 31 32334424 32334424 UNIQUE2
2094748486 2 32334424 2 TWO
2094748481 4 32334424 4 FOUR
2094748476 10 32334424 10 TEN
2094748471 20 32334424 20 TWENTY
2094748466 77 32334424 100 ONEPERCENT
2094748461 10 32334424 10 TENPERCENT
2094748456 5 32334424 5 TWENTYPERCENT
2094748451 2 32334424 2 FIFTYPERCENT
2094748446 38 32334424 32334424 UNIQUE3
2094748441 77 32334424 100 EVENONEPERCENT
2094748436 77 32334424 100 ODDONEPERCENT
2094748431 62 32334424 32334424 STRINGU1
2094748426 62 32334424 32334424 STRINGU2
2094748421 4 32334424 4 STRING4
2094748416 8 32334424 8 "_SALT_"
2094748501 1 32334424 32334424 "_SALT_", UNIQUE2

--- SQL operation complete.
>>
>>create table t1
+> (
+> UNIQUE1 INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , UNIQUE2 INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , TWO INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , FOUR INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , TEN INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , TWENTY INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , ONEPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , TENPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , TWENTYPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , FIFTYPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , UNIQUE3 INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , EVENONEPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , ODDONEPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , STRINGU1 CHAR(52) CHARACTER SET ISO88591 COLLATE
+> DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
+> , STRINGU2 CHAR(52) CHARACTER SET ISO88591 COLLATE
+> DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
+> , STRING4 CHAR(52) CHARACTER SET ISO88591 COLLATE
+> DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
+> , PRIMARY KEY (UNIQUE2 ASC)
+> )
+>;

--- SQL operation complete.
>>
>>create table t2
+> (
+> UNIQUE1 INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , UNIQUE2 INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , TWO INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , FOUR INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , TEN INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , TWENTY INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , ONEPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , TENPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , TWENTYPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , FIFTYPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , UNIQUE3 INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , EVENONEPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , ODDONEPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
+> , STRINGU1 CHAR(52) CHARACTER SET ISO88591 COLLATE
+> DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
+> , STRINGU2 CHAR(52) CHARACTER SET ISO88591 COLLATE
+> DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
+> , STRING4 CHAR(52) CHARACTER SET ISO88591 COLLATE
+> DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
+> , PRIMARY KEY (UNIQUE2 ASC)
+> )
+>;

--- SQL operation complete.
>>
>>create index i2 on t2 (two, four, unique1, unique3);

--- SQL operation complete.
>>
>>showddl t1;

CREATE TABLE TRAFODION.MYTEST9.T1
  (
    UNIQUE1 INT NO DEFAULT NOT NULL NOT DROPPABLE
  , UNIQUE2 INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TWO INT NO DEFAULT NOT NULL NOT DROPPABLE
  , FOUR INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TEN INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TWENTY INT NO DEFAULT NOT NULL NOT DROPPABLE
  , ONEPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TENPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TWENTYPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , FIFTYPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , UNIQUE3 INT NO DEFAULT NOT NULL NOT DROPPABLE
  , EVENONEPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , ODDONEPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , STRINGU1 CHAR(52) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , STRINGU2 CHAR(52) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , STRING4 CHAR(52) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , PRIMARY KEY (UNIQUE2 ASC)
  )
;

--- SQL operation complete.
>>showddl t2;

CREATE TABLE TRAFODION.MYTEST9.T2
  (
    UNIQUE1 INT NO DEFAULT NOT NULL NOT DROPPABLE
  , UNIQUE2 INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TWO INT NO DEFAULT NOT NULL NOT DROPPABLE
  , FOUR INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TEN INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TWENTY INT NO DEFAULT NOT NULL NOT DROPPABLE
  , ONEPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TENPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , TWENTYPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , FIFTYPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , UNIQUE3 INT NO DEFAULT NOT NULL NOT DROPPABLE
  , EVENONEPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , ODDONEPERCENT INT NO DEFAULT NOT NULL NOT DROPPABLE
  , STRINGU1 CHAR(52) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , STRINGU2 CHAR(52) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , STRING4 CHAR(52) CHARACTER SET ISO88591 COLLATE
      DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE
  , PRIMARY KEY (UNIQUE2 ASC)
  )
;

CREATE INDEX I2 ON TRAFODION.MYTEST9.T2
  (
    TWO ASC
  , FOUR ASC
  , UNIQUE1 ASC
  , UNIQUE3 ASC
  )
;

--- SQL operation complete.
>>
>>select count(*) from trafodion.g_wisc32.abase;

(EXPR)
--------------------

            32000000

--- 1 row(s) selected.
>>select count(*) from trafodion.g_wisc32.abase where unique2 < 1000000;

(EXPR)
--------------------

             1000000

--- 1 row(s) selected.
>>
>>prepare x1 from upsert using load into t1 select * from trafodion.g_wisc32.aba
se where unique2 < 1000000;

--- SQL command prepared.
>>prepare x2 from upsert using load into t2 select * from trafodion.g_wisc32.aba
se where unique2 < 1000000;

--- SQL command prepared.
>>explain options 'f' x1;

LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------------------- -------- -------------------- ---------

5 . 6 root 1.00E+006
4 . 5 esp_exchange 1:8(hash2) 1.00E+006
2 3 4 tuple_flow 1.00E+006
. . 3 trafodion_load T1 1.00E+000
1 . 2 sort 1.00E+006
. . 1 trafodion_scan ABASE 1.00E+006

--- SQL operation complete.
>>explain options 'f' x2;

LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------------------- -------- -------------------- ---------

7 . 8 root x 1.00E+006
6 . 7 esp_exchange 1:8(hash2) 1.00E+006
4 5 6 nested_join 1.00E+006
. . 5 trafodion_insert I2 1.00E+000
2 3 4 nested_join 1.00E+006
. . 3 trafodion_ T2 1.00E+000
1 . 2 sort 1.00E+006
. . 1 trafodion_scan ABASE 1.00E+006
--- SQL operation complete.
>>explain x1;

------------------------------------------------------------------ PLAN SUMMARY
MODULE_NAME .............. DYNAMICALLY COMPILED
STATEMENT_NAME ........... X1
PLAN_ID .................. 212289671902829625
ROWS_OUT ......... 1,007,717
EST_TOTAL_COST ........... 1.52
STATEMENT ................ upsert using load into t1 select * from
                             trafodion.g_wisc32.abase where unique2 < 1000000;

------------------------------------------------------------------ NODE LISTING
ROOT ====================================== SEQ_NO 6 ONLY CHILD 5
REQUESTS_IN .............. 1
ROWS_OUT ......... 1,007,717
EST_OPER_COST ............ 0
EST_TOTAL_COST ........... 1.52
DESCRIPTION
  max_card_est ........... 1.00772e+06
  fragment_id ............ 0
parent_frag ............ (none)
  fragment_type .......... master
  statement_index ........ 0
  affinity_value ......... 0
  est_memory_per_cpu ..... 25586 KB
  max_max_cardinal 1,007,717
  total_overflow_size .... 0.00 KB
  esp_2_node_map ......... (\NSK:-1:-1:-1:-1:-1:-1:-1:-1)
  xn_access_mode ......... read_only
  xn_autoabort_interval 0
  auto_query_retry ....... enabled
  plan_version ....... 2,600
  embedded_arkcmp ........ used
  IS_SQLCI ............... ON
  LDAP_USERNAME
  SCHEMA ................. MYTEST9
  ObjectUIDs ............. 533733601026556210, 28205037036521192

ESP_EXCHANGE ============================== SEQ_NO 5 ONLY CHILD 4
REQUESTS_IN .............. 1
ROWS_OUT ......... 1,007,717
EST_OPER_COST ............ 0.01
EST_TOTAL_COST ........... 1.52
DESCRIPTION
  max_card_est ........... 1.00772e+06
  fragment_id ............ 2
  parent_frag ............ 0
  fragment_type .......... esp
  buffer_size ........ 5,000
  record_length .......... 0
  parent_processes ....... 1
  child_processes ........ 8
  child_partitioning_func hash2 partitioned 8 ways on
                             (TRAFODION.G_WISC32.ABASE.UNIQUE2)
  seamonster_query ....... no
  seamonster_exchange .... no

TUPLE_FLOW ================================ SEQ_NO 4 CHILDREN 2, 3
REQUESTS_IN .............. 1
ROWS_OUT ......... 1,007,717
EST_OPER_COST ............ 0.01
EST_TOTAL_COST ........... 1.52
DESCRIPTION
  max_card_est ........... 1.00772e+06
fragment_id ............ 2
  parent_frag ............ 0
  fragment_type .......... esp
  join_type .............. inner
  join_method ............ in-order nested
  parallel_join_type ..... 1 (OCR Outer Child Repartitioning)

TRAFODION_LOAD ============================ SEQ_NO 3 NO CHILDREN
TABLE_NAME ............... TRAFODION.MYTEST9.T1
REQUESTS_IN .............. 1
ROWS_OUT ................. 1
EST_OPER_COST ............ 0
EST_TOTAL_COST ........... 0
DESCRIPTION
  max_card_est ........... 1.00772e+06
  fragment_id ............ 2
  parent_frag ............ 0
  fragment_type .......... esp
  iud_type ............... trafodion_load TRAFODION.MYTEST9.T1
  new_rec_expr ........... (UNIQUE1 assign TRAFODION.G_WISC32.ABASE.UNIQUE1),
                             (UNIQUE2 assign TRAFODION.G_WISC32.ABASE.UNIQUE2),
                             (TWO assign TRAFODION.G_WISC32.ABASE.TWO),
                             (FOUR assign TRAFODION.G_WISC32.ABASE.FOUR),
                             (TEN assign TRAFODION.G_WISC32.ABASE.TEN),
                             (TWENTY assign TRAFODION.G_WISC32.ABASE.TWENTY),
                             (ONEPERCENT assign TRAFODION.G_WISC32.ABASE.ONEPER
                             CENT), (TENPERCENT assign
                             TRAFODION.G_WISC32.ABASE.TENPERCENT),
                             (TWENTYPERCENT assign TRAFODION.G_WISC32.ABASE.TWE
                             NTYPERCENT), (FIFTYPERCENT assign
                             TRAFODION.G_WISC32.ABASE.FIFTYPERCENT),
                             (UNIQUE3 assign TRAFODION.G_WISC32.ABASE.UNIQUE3),
                             (EVENONEPERCENT assign TRAFODION.G_WISC32.ABASE.EV
                             ENONEPERCENT), (ODDONEPERCENT assign
                             TRAFODION.G_WISC32.ABASE.ODDONEPERCENT),
                             (STRINGU1 assign TRAFODION.G_WISC32.ABASE.STRINGU1
                             ), (STRINGU2 assign TRAFODION.G_WISC32.ABASE.STRIN
                             GU2), (STRING4 assign TRAFODION.G_WISC32.ABASE.STR
                             ING4)

SORT ====================================== SEQ_NO 2 ONLY CHILD 1
REQUESTS_IN .............. 1
ROWS_OUT ......... 1,007,717
EST_OPER_COST ............ 0.06
EST_TOTAL_COST ........... 1.51
DESCRIPTION
  memory_quota_per_esp ... 0 MB
  max_card_est ........... 1.00772e+06
  fragment_id ............ 2
  parent_frag ............ 0
  fragment_type .......... esp
  est_memory_per_cpu ..... 25586 KB
  sort_type .............. full
  sort_key ............... TRAFODION.G_WISC32.ABASE.UNIQUE2

TRAFODION_SCAN ============================ SEQ_NO 1 NO CHILDREN
TABLE_NAME ............... ABASE
REQUESTS_IN .............. 1
ROWS_OUT ......... 1,007,717
EST_OPER_COST ............ 1.45
EST_TOTAL_COST ........... 1.45
DESCRIPTION
  max_card_est ........... 1.00772e+06
  fragment_id ............ 2
  parent_frag ............ 0
  fragment_type .......... esp
scan_type .............. subset scan limited by mdam of table
                             TRAFODION.G_WISC32.ABASE
  object_type ............ Trafodion
  cache_size ........ 10,000
  probes ................. 1
  rows_accessed .......... 1.00772e+06
  key_columns ............ _SALT_, UNIQUE2
  mdam_disjunct .......... (UNIQUE2 < 1000000) and (_SALT_ >=
                             (\:_sys_HostVarLoHashPart Hash2Distrib 8)) and
                             (_SALT_ <= (\:_sys_HostVarHiHashPart Hash2Distrib
                             8))
  part_key_predicates .... (UNIQUE2 < 1000000)

--- SQL operation complete.
>>explain x2;

------------------------------------------------------------------ PLAN SUMMARY
MODULE_NAME .............. DYNAMICALLY COMPILED
STATEMENT_NAME ........... X2
PLAN_ID .................. 212289671903453036
ROWS_OUT ......... 1,007,717
EST_TOTAL_COST ........... 1.53
STATEMENT ................ upsert using load into t2 select * from
                             trafodion.g_wisc32.abase where unique2 < 1000000;

------------------------------------------------------------------ NODE LISTING
ROOT ====================================== SEQ_NO 8 ONLY CHILD 7
REQUESTS_IN .............. 1
ROWS_OUT ......... 1,007,717
EST_OPER_COST ............ 0
EST_TOTAL_COST ........... 1.53
DESCRIPTION
  max_card_est ........... 1.00772e+06
  fragment_id ............ 0
  parent_frag ............ (none)
  fragment_type .......... master
  statement_index ........ 0
  affinity_value ......... 0
  est_memory_per_cpu ..... 25586 KB
  max_max_cardinal 1,007,717
  total_overflow_size .... 0.00 KB
  esp_2_node_map ......... (\NSK:-1:-1:-1:-1:-1:-1:-1:-1)
  upd_action_on_error .... xn_rollback
  xn_access_mode ......... read_only
  xn_autoabort_interval 0
auto_query_retry ....... enabled
  plan_version ....... 2,600
  embedded_arkcmp ........ used
  IS_SQLCI ............... ON
  LDAP_USERNAME
  SCHEMA ................. MYTEST9
  ObjectUIDs ............. 533733601026556210, 28205037036521336,
                             28205037036521591

ESP_EXCHANGE ============================== SEQ_NO 7 ONLY CHILD 6
REQUESTS_IN .............. 1
ROWS_OUT ......... 1,007,717
EST_OPER_COST ............ 0.01
EST_TOTAL_COST ........... 1.53
DESCRIPTION
  max_card_est ........... 1.00772e+06
  fragment_id ............ 2
  parent_frag ............ 0
  fragment_type .......... esp
  buffer_size ........ 5,000
  record_length .......... 0
  parent_processes ....... 1
  child_processes ........ 8
  child_partitioning_func hash2 partitioned 8 ways on
                             (TRAFODION.G_WISC32.ABASE.UNIQUE2)
  seamonster_query ....... no
  seamonster_exchange .... no

NESTED_JOIN =============================== SEQ_NO 6 CHILDREN 4, 5
REQUESTS_IN .............. 1
ROWS_OUT ......... 1,007,717
EST_OPER_COST ............ 0.01
EST_TOTAL_COST ........... 1.53
DESCRIPTION
  max_card_est ........... 1.00772e+06
  fragment_id ............ 2
  parent_frag ............ 0
  fragment_type .......... esp
  join_type .............. inner
  join_method ............ nested
  parallel_join_type ..... 2 (N2J Opens all inner partitions)

TRAFODION_INSERT ========================== SEQ_NO 5 NO CHILDREN
TABLE_NAME ............... TRAFODION.MYTEST9.I2
REQUESTS_IN .............. 1
ROWS_OUT ................. 1
EST_OPER_COST ............ 0
EST_TOTAL_COST ........... 0
DESCRIPTION
  max_card_est ........... 1.00772e+06
  fragment_id ............ 2
  parent_frag ............ 0
  fragment_type .......... esp
  iud_type ............... index_trafodion_insert TRAFODION.MYTEST9.I2
  new_rec_expr ........... ("TWO@" assign TRAFODION.MYTEST9.T2.TWO),
                             ("FOUR@" assign TRAFODION.MYTEST9.T2.FOUR),
                             ("UNIQUE1@" assign TRAFODION.MYTEST9.T2.UNIQUE1),
                             ("UNIQUE3@" assign TRAFODION.MYTEST9.T2.UNIQUE3),
                             (UNIQUE2 assign TRAFODION.MYTEST9.T2.UNIQUE2)

NESTED_JOIN =============================== SEQ_NO 4 CHILDREN 2, 3
REQUESTS_IN .............. 1
ROWS_OUT ......... 1,007,717
EST_OPER_COST ............ 0.01
EST_TOTAL_COST ........... 1.52
DESCRIPTION
  max_card_est ........... 1.00772e+06
  fragment_id ............ 2
  parent_frag ............ 0
  fragment_type .......... esp
  join_type .............. inner
  join_method ............ in-order nested
  parallel_join_type ..... 1 (OCR Outer Child Repartitioning)

TRAFODION_ ================================ SEQ_NO 3 NO CHILDREN
TABLE_NAME ............... TRAFODION.MYTEST9.T2
REQUESTS_IN .............. 1
ROWS_OUT ................. 1
EST_OPER_COST ............ 0
EST_TOTAL_COST ........... 0
DESCRIPTION
  max_card_est ........... 1.00772e+06
  fragment_id ............ 2
  parent_frag ............ 0
  fragment_type .......... esp
  iud_type ............... trafodion_ TRAFODION.MYTEST9.T2
  new_rec_expr ........... (UNIQUE1 assign TRAFODION.G_WISC32.ABASE.UNIQUE1),
                             (UNIQUE2 assign TRAFODION.G_WISC32.ABASE.UNIQUE2),
                             (TWO assign TRAFODION.G_WISC32.ABASE.TWO),
                             (FOUR assign TRAFODION.G_WISC32.ABASE.FOUR),
                             (TEN assign TRAFODION.G_WISC32.ABASE.TEN),
                             (TWENTY assign TRAFODION.G_WISC32.ABASE.TWENTY),
                             (ONEPERCENT assign TRAFODION.G_WISC32.ABASE.ONEPER
                             CENT), (TENPERCENT assign
                             TRAFODION.G_WISC32.ABASE.TENPERCENT),
                             (TWENTYPERCENT assign TRAFODION.G_WISC32.ABASE.TWE
                             NTYPERCENT), (FIFTYPERCENT assign
                             TRAFODION.G_WISC32.ABASE.FIFTYPERCENT),
                             (UNIQUE3 assign TRAFODION.G_WISC32.ABASE.UNIQUE3),
                             (EVENONEPERCENT assign TRAFODION.G_WISC32.ABASE.EV
                             ENONEPERCENT), (ODDONEPERCENT assign
                             TRAFODION.G_WISC32.ABASE.ODDONEPERCENT),
                             (STRINGU1 assign TRAFODION.G_WISC32.ABASE.STRINGU1
                             ), (STRINGU2 assign TRAFODION.G_WISC32.ABASE.STRIN
                             GU2), (STRING4 assign TRAFODION.G_WISC32.ABASE.STR
                             ING4)

SORT ====================================== SEQ_NO 2 ONLY CHILD 1
REQUESTS_IN .............. 1
ROWS_OUT ......... 1,007,717
EST_OPER_COST ............ 0.06
EST_TOTAL_COST ........... 1.51
DESCRIPTION
  memory_quota_per_esp ... 0 MB
  max_card_est ........... 1.00772e+06
  fragment_id ............ 2
  parent_frag ............ 0
  fragment_type .......... esp
  est_memory_per_cpu ..... 25586 KB
  sort_type .............. full
  sort_key ............... TRAFODION.G_WISC32.ABASE.UNIQUE2

TRAFODION_SCAN ============================ SEQ_NO 1 NO CHILDREN
TABLE_NAME ............... ABASE
REQUESTS_IN .............. 1
ROWS_OUT ......... 1,007,717
EST_OPER_COST ............ 1.45
EST_TOTAL_COST ........... 1.45
DESCRIPTION
  max_card_est ........... 1.00772e+06
  fragment_id ............ 2
parent_frag ............ 0
  fragment_type .......... esp
  scan_type .............. subset scan limited by mdam of table
                             TRAFODION.G_WISC32.ABASE
  object_type ............ Trafodion
  cache_size ........ 10,000
  probes ................. 1
  rows_accessed .......... 1.00772e+06
  key_columns ............ _SALT_, UNIQUE2
  mdam_disjunct .......... (UNIQUE2 < 1000000) and (_SALT_ >=
                             (\:_sys_HostVarLoHashPart Hash2Distrib 8)) and
                             (_SALT_ <= (\:_sys_HostVarHiHashPart Hash2Distrib
                             8))
  part_key_predicates .... (UNIQUE2 < 1000000)

--- SQL operation complete.
>>
>>sh date;
>>execute x1;

--- 1000000 row(s) inserted.
>>sh date;
>>drop table t1 cascade;

--- SQL operation complete.
>>sh date;
>>execute x2;

Tags: sql-exe
Changed in trafodion:
assignee: nobody → Mike Hanlon (mike-hanlon)
Revision history for this message
Anoop Sharma (anoop-sharma) wrote :

Insert into table with indexes are transactional inserts even if load syntax is used.
They are running into some kind of dtm cache/memory limit due to
the large number of rows being inserted within a transaction.
It is never recommended to insert so many rows within a transaction.

This issue has been seen before and logs can provide more info
on what is happening but there isn't going to be any quick remedy to this.
It ultimately gets stuck on underlying system, region splits, data flush,
memory overflow and other system related issues.

To insert so many rows, one should first insert them
into the table without the index which will use load to insert.
Once that insert is done, one should create the index on it which
will then internally use load to populate the index.

Or one can disable that index, load the table and enable index after
that (disable index support is not yet in). 'upsert using load' can
also do that internally. There may be an unexternalized option to
do that. Khaled can comment on that.

When we have support for incremental loads, that may alleviate loading
a table with incremental data in non-transactional mode.

We can probably give a warning or an error is one tries to do large
inserts on table with index.

Revision history for this message
Weishiun Tsai (wei-shiun-tsai) wrote :

Just to add more info about the current behavior of this problem. After 12+ hours, the execution of x2 finally came back. But it returned both error[8606] and --- 1000000 row(s) inserted. Presumably the row insert into the table succeeded, but the index population failed.

>>sh date;
Mon Feb 2 21:18:59 UTC 2015
>>execute x2;

*** ERROR[8606] Transaction subsystem TMF returned error 97 on a commit transaction.

--- 1000000 row(s) inserted.
>>sh date;
Tue Feb 3 09:03:47 UTC 2015

Revision history for this message
Weishiun Tsai (wei-shiun-tsai) wrote :
Download full text (3.3 KiB)

This problem is not only for an upsert statement with index. It is obviously a limit that Trafodion can also run into with a delete or an update (without any index). Unlike insert, which has an alternative of using upsert to avoid transactions, delete and update do not similar alternatives. Any attempt to address this problem should also take insert/delete/update into consideration. As shown here, the script below tries to delete 5000 rows from a table. The delete statement ran for 4 hours. After that, it returned a TMF error, a conflicting message saying ‘5000 row(s) deleted’, and the rows were not really deleted. 5000 rows seem to be the limit for this particular table. A delete of 4000 rows ran OK.

------------------------------------------------

Here is the script:

drop schema mytest cascade;
create schema mytest;
set schema mytest;
create table abase like g_wisc32.abase with constraints with partitions;
upsert using load into abase select * from g_wisc32.abase;
select count(*) from abase;
update statistics for table abase on every column sample random 5 percent;
prepare x1 from select count(*) from abase where unique2 < 5000;
prepare x2 from delete from abase where unique2 < 5000;
explain options 'f' x1;
explain options 'f' x2;
sh date;
execute x1;
sh date;
execute x2;
sh date;
select count(*) from abase;

------------------------------------------------

Here is the execution output:

>>drop schema mytest cascade;

--- SQL operation complete.
>>create schema mytest;

--- SQL operation complete.
>>set schema mytest;

--- SQL operation complete.
>>create table abase like g_wisc32.abase with constraints with partitions;

--- SQL operation complete.
>>upsert using load into abase select * from g_wisc32.abase;

--- 32000000 row(s) inserted.
>>select count(*) from abase;

(EXPR)
--------------------

            32000000

--- 1 row(s) selected.
>>update statistics for table abase on every column sample random 5 percent;

--- SQL operation complete.
>>prepare x1 from select count(*) from abase where unique2 < 5000;

--- SQL command prepared.
>>prepare x2 from delete from abase where unique2 < 5000;

--- SQL command prepared.
>>explain options 'f' x1;

LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------------------- -------- -------------------- ---------

2 . 3 root 1.00E+000
1 . 2 sort_scalar_aggr 1.00E+000
. . 1 trafodion_scan ABASE 3.29E+001

--- SQL operation complete.
>>explain options 'f' x2;

LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------------------- -------- -------------------- ---------

1 . 2 root o x 3.29E+001
. . 1 trafodion_delete ABASE 3.29E+001

--- SQL operation complete.
>>sh date;
Sun Feb 8 23:09:52 UTC 2015
>>execute x1;

(EXPR)
--------------------

                5000

--- 1 row(s) selected.
>>sh date;
Mon Feb 9 00:14:43 UTC 2015
>>execute x2;

*** ERROR[8606] Transaction ...

Read more...

Revision history for this message
Sandhya Sundaresan (sandhya-sundaresan) wrote :

To avoid running into this, we need a change from TM.
 TM should return an error when it's close to reaching some thrashold after which it is unable to handle a very large transactional operation. It will be iaccurate to rely on compielr estimates to guess that this limit may be reached.

Once TM makes this change, we will add better error handling at the SQL layer to handle this error and return a meaningful error message .

For the specific upsert with index case, refer to the wiki page that talks about how upsert with index or other indexes is a transactional operation. The recommendation is to disable the index before the upsert.
https://wiki.trafodion.org/wiki/index.php/Data_Loading

Changed in trafodion:
importance: Critical → High
Revision history for this message
Sandhya Sundaresan (sandhya-sundaresan) wrote :

Additional details :
We need 3 things from TM:
1) support for IUD operations of higher number of rows. 5-10K seems reasonable that
 users can run within one transaction. Need TM to diagnose what kind of limit it is running
into, is it an architectural issue or a system/configuration issue
2) an indication to be returned at runtime by dtm if a limit or condition is about to be reached that may cause
the IUD operation to fail
3) if that condition is actually reached, to return an error instead of crash/hangs or non-deterministic behavior.

Changed in trafodion:
importance: High → Critical
assignee: Mike Hanlon (mike-hanlon) → Atanu Mishra (atanu-mishra)
Changed in trafodion:
assignee: Atanu Mishra (atanu-mishra) → Joanie Cooper (joanie-cooper)
Changed in trafodion:
status: New → In Progress
Revision history for this message
Joanie Cooper (joanie-cooper) wrote :

A new configurable memory usage threshold and query throttle mechanism has been delivered to the TrxRegionEndpoint coprocessor. This feature provides a throttling mechanism to either post a trace message indicating the memoryUsage for the regionserver process has exceeded the configured threshold or return an error to the coprocessor RPC call.

 For example:

We could detect and send an error when we’ve reached a preconfigured memory usage threshold, e.g. 90%

>>delete from j1 where a = 3;

*** ERROR[8448] Unable to access Hbase interface. Call to ExpHbaseInterface::deleteRow returned error HBASE_ACCESS_ERROR(-705). Cause:
java.io.IOException: org.apache.hadoop.hbase.client.transactional.MemoryUsageException: checkAndDelete memory usage exceeds 90 percent
org.apache.hadoop.hbase.client.transactional.TransactionalTable.checkAndDelete(TransactionalTable.java:361)
org.apache.hadoop.hbase.client.transactional.RMInterface.checkAndDelete(RMInterface.java:196)
org.trafodion.sql.HBaseAccess.HTableClient.checkAndDeleteRow(HTableClient.java:773)
.

--- 0 row(s) deleted.

The default settings for the memoryUsage threshold is 100%. The memoryUsage throttling is set to "warn only" with a boolean value of "true".
The memoryUsage thread is set to sleep for "15" seconds.

To reconfigure these settings use the hbase configuration properties:

"hbase.transaction.memory.threshold" - what is the memoryUsage threshold, default is 100%.
"hbase.transaction.memory.warn.only" - what is the memoryUsage throttling (warning trace message or return error), default is true.
"hbase.transaction.memory.sleep" - what is the new memoryUsage regionserver thread sleep time, default is 15 seconds

Revision history for this message
Joanie Cooper (joanie-cooper) wrote :

We are still waiting to try this configuration on a performance cluster before making a configuration recommendation on the R1.1.0 wiki.

Changed in trafodion:
status: In Progress → Fix Committed
Revision history for this message
Joanie Cooper (joanie-cooper) wrote :

One additional feature was added to allow the regionserver to initiate a garbage collection when the memory usage threshold has been exceeded. The performance team ran with the memory usage threshold setting to 70%. No problems were reported.

Revision history for this message
Joanie Cooper (joanie-cooper) wrote :

For the specific test case documented in this LP bug, the query will run for a very long time.
The query is not hung, it is just executing slowly.

The following hbase-site.xml settings can be used to test the new TrxRegionEndpoint coprocessor memory usage
code:

  "hbase.transaction.memory.threshold" set to a value less than 100.
  "hbase.transaction.memory.warn.only" default is true, set to false if an error message is desired to end the running query.
  "hbase.transaction.memory.perform.GC" default is false. Set to true if a garbage collection should be initiated to try to
  immediately regain memory for the region server, when the memory threshold has been exceeded.

It has been seen that when the query uses the new memory usage threshold settings, the query can finish,
but may still experience a "Unique Constraint" error. This was noted as being a possible problem in SQL
not associated with this particular LP bug.

Revision history for this message
Weishiun Tsai (wei-shiun-tsai) wrote :

This is what's seen on the v0519 build when executing x2. This bug report was originally created hoping that work can be done from the SQL side to ease such resource pressure, or to deal with it internally by disabling the index during the data loading and re-enabling it back after the loading is done. But there is no plan to do so currently. Instead, it was recommended that the user create the index only after the data has been loaded.

We will close this report as 'Won't Fix' for now.

>>sh date;
Wed May 20 18:55:44 UTC 2015
>>execute x2;

*** ERROR[8448] Unable to access Hbase interface. Call to ExpHbaseInterface::rowExists returned error HBASE_ACCESS_ERROR(-705). Cause:
java.io.IOException: java.io.IOException: TrxRegionEndpoint coprocessor: checkAndPut - java.lang.RuntimeException: org.apache.hadoop.hbase.regionserver.Leases$LeaseStillHeldException
org.apache.hadoop.hbase.client.transactional.TransactionalTable.checkAndPut(TransactionalTable.java:450)
org.apache.hadoop.hbase.client.transactional.RMInterface.checkAndPut(RMInterface.java:347)
org.trafodion.sql.HBaseAccess.HTableClient.putRow(HTableClient.java:929)
org.trafodion.sql.HBaseAccess.HTableClient.checkAndInsertRow(HTableClient.java:1005)
.

--- 0 row(s) inserted.
>>sh date;
Wed May 20 20:56:44 UTC 2015

Changed in trafodion:
status: Fix Committed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.