Trafodion

Query returned wrong result due to wrong row counts in indexes

Bug #1396793 reported by Weishiun Tsai on 2014-11-26

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Trafodion	Fix Released	Critical	Mike Hanlon	Trafodion r1.0

Bug Description

The following query uses QA global tables from g_hcubedb. The tables were not rebuilt between the v1119_0830 build and the v1125_0830 build. The stats information show that the stats were created from the 1st build and remain the same in the 2nd build. However, the plan chosen in the v1119_0830 build was a parallel one and it returned 1000 as the result. The plan chosen in the v1125_0830 build is a serial one and it returns 0, which is a wrong result.

This is a data correctness issue and a regression, so it is marked as Critical.

------------------------------------------------------------------------------

Here is the query to reproduce this problem. It requires QA’s g_hcubedb tables.

prepare s13 from select count(*) from trafodion.g_hcubedb.T1,trafodion.g_hcubedb.T2,trafodion.g_hcubedb.T3,trafodion.g_hcubedb.T4,trafodion.g_hcubedb.T5,trafodion.g_hcubedb.T6,trafodion.g_hcubedb.T7,trafodion.g_hcubedb.T8,trafodion.g_hcubedb.T9,trafodion.g_hcubedb.CUBE1 where
trafodion.g_hcubedb.CUBE1.a=trafodion.g_hcubedb.T9.a and trafodion.g_hcubedb.CUBE1.b=trafodion.g_hcubedb.T8.b and trafodion.g_hcubedb.CUBE1.c=trafodion.g_hcubedb.T7.c and
trafodion.g_hcubedb.T9.b=trafodion.g_hcubedb.T1.b and trafodion.g_hcubedb.T9.c=trafodion.g_hcubedb.T2.c and
trafodion.g_hcubedb.T8.a=trafodion.g_hcubedb.T3.a and trafodion.g_hcubedb.T8.c=trafodion.g_hcubedb.T4.c and
trafodion.g_hcubedb.T7.a=trafodion.g_hcubedb.T5.a and trafodion.g_hcubedb.T7.b=trafodion.g_hcubedb.T6.a;

explain options 'f' s13;

execute s13;

------------------------------------------------------------------------------

Here is the row count information of the tables involved, showstats for CUBE1, and STATS_TIME from SB_HISTOGRAMS showing that the stats have not changed between the two builds.

>>set schema g_hcubedb;

--- SQL operation complete.
>>select count(*) from T1;

(EXPR)
--------------------

--- 1 row(s) selected.
>>select count(*) from T2;

(EXPR)
--------------------

--- 1 row(s) selected.
>>select count(*) from T3;

(EXPR)
--------------------

--- 1 row(s) selected.
>>select count(*) from T4;

(EXPR)
--------------------

--- 1 row(s) selected.
>>select count(*) from T5;

(EXPR)
--------------------

--- 1 row(s) selected.
>>select count(*) from T6;

(EXPR)
--------------------

100

--- 1 row(s) selected.
>>select count(*) from T7;

(EXPR)
--------------------

100

--- 1 row(s) selected.
>>select count(*) from T8;

(EXPR)
--------------------

1000

--- 1 row(s) selected.
>>select count(*) from T9;

(EXPR)
--------------------

10000

--- 1 row(s) selected.
>>select count(*) from CUBE1;

(EXPR)
--------------------

100000

--- 1 row(s) selected.
>>showstats for table CUBE1 on every column;

Histogram data for Table TRAFODION.G_HCUBEDB.CUBE1
Table ID: 53531328855223099

Hist ID # Ints Rowcount UEC Colname(s)
========== ====== =========== =========== ===========================
1491884843 8 100000 8 "_SALT_"
1491884838 10 100000 10 A
1491884833 76 100000 100 B
1491884828 76 100000 100 C
1491884823 10 100000 10 D
1491884818 76 100000 100 E
1491884813 76 100000 100 F
1491884808 1 100000 1 TXT
1491884913 1 100000 100000 "_SALT_", A, B, C
1491884908 1 100000 100000 D, "_SALT_", A, B, C
1491884903 1 100000 8000 D, "_SALT_", A, B
1491884898 1 100000 80 D, "_SALT_", A
1491884893 1 100000 80 D, "_SALT_"
1491884888 1 100000 100000 E, "_SALT_", A, B, C
1491884883 1 100000 8000 E, "_SALT_", A, B
1491884878 1 100000 8000 E, "_SALT_", A
1491884873 1 100000 800 E, "_SALT_"
1491884868 1 100000 100000 F, "_SALT_", A, B, C
1491884863 1 100000 100000 F, "_SALT_", A, B
1491884858 1 100000 8000 F, "_SALT_", A
1491884853 1 100000 800 F, "_SALT_"

--- SQL operation complete.
>>select distinct(STATS_TIME) from SB_HISTOGRAMS;

STATS_TIME
-------------------

2014-11-20 06:57:44
2014-11-20 06:58:09
2014-11-20 06:58:16
2014-11-20 06:58:23
2014-11-20 06:58:30
2014-11-20 06:58:37
2014-11-20 06:58:43
2014-11-20 06:59:02
2014-11-20 06:59:10
2014-11-20 06:59:24
2014-11-20 06:59:36
2014-11-20 06:59:49
2014-11-20 07:00:06
2014-11-20 07:00:12
2014-11-20 07:00:19
2014-11-20 07:01:04
2014-11-20 07:01:45
2014-11-20 07:09:58

--- 18 row(s) selected.

------------------------------------------------------------------------------

Here is the execution output showing the in the v1125_0830 build, the query plan is a serial one, and it returns 0, which is a wrong result.

>>prepare s13 from select count(*) from trafodion.g_hcubedb.T1,trafodion.g_hcubedb.T2,trafodion.g_hcubedb.T3,trafodion.g_hcubedb.T4,trafodion.g_hcubedb.T5,trafodion.g_hcubedb.T6,trafodion.g_hcubedb.T7,trafodion.g_hcubedb.T8,trafodion.g_hcubedb.T9,trafodion.g_hcubedb.CUBE1 where
+> trafodion.g_hcubedb.CUBE1.a=trafodion.g_hcubedb.T9.a and trafodion.g_hcubedb.CUBE1.b=trafodion.g_hcubedb.T8.b and trafodion.g_hcubedb.CUBE1.c=trafodion.g_hcubedb.T7.c and
+> trafodion.g_hcubedb.T9.b=trafodion.g_hcubedb.T1.b and trafodion.g_hcubedb.T9.c=trafodion.g_hcubedb.T2.c and
+> trafodion.g_hcubedb.T8.a=trafodion.g_hcubedb.T3.a and trafodion.g_hcubedb.T8.c=trafodion.g_hcubedb.T4.c and
+> trafodion.g_hcubedb.T7.a=trafodion.g_hcubedb.T5.a and trafodion.g_hcubedb.T7.b=trafodion.g_hcubedb.T6.a;

--- SQL command prepared.
>>explain options 'f' s13;

LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------------------- -------- -------------------- ---------

20 . 21 root 19 . 20 sort_scalar_aggr 18 1 19 hybrid_hash_join 17 2 18 hybrid_hash_join 16 3 17 hybrid_hash_join 15 4 16 hybrid_hash_join 14 5 15 hybrid_hash_join 13 6 14 hybrid_hash_join 12 7 13 hybrid_hash_join 11 8 12 hybrid_hash_join 10 9 11 hybrid_hash_join . . 10 trafodion_scan . . 9 trafodion_scan . . 8 trafodion_scan . . 7 trafodion_index_scan . . 6 trafodion_scan . . 5 trafodion_scan . . 4 trafodion_scan . . 3 trafodion_scan . . 2 trafodion_scan . . 1 trafodion_scan 1.00E+000
1.00E+000
1.00E+001
1.00E+001
1.00E+003
1.00E+002
1.00E+002
1.00E+003
1.00E+003
1.00E+002
1.00E+002
T9 1.00E+004
T1 1.00E+001
T2 1.00E+001
CUBE1 1.00E+005
T7 1.00E+002
T5 1.00E+001
T6 1.00E+002
T8 1.00E+003
T3 1.00E+001
T4 1.00E+001

--- SQL operation complete.
>>execute s13;

(EXPR)
--------------------

--- 1 row(s) selected.

------------------------------------------------------------------------------

Here is the execution output showing the in the v1119_0830 build, the query plan was a parallel one, and it returned the correct result 1000.

SQL>prepare s13 from select count(*) from trafodion.g_hcubedb.T1,trafodion.g_hcubedb.T2,trafodion.g_hcubedb.T3,trafodion.g_hcubedb.T4,trafodion.g_hcubedb.T5,trafodion.g_hcubedb.T6,trafodion.g_hcubedb.T7,trafodion.g_hcubedb.T8,trafodion.g_hcubedb.T9,trafodion.g_hcubedb.CUBE1 where
trafodion.g_hcubedb.CUBE1.a=trafodion.g_hcubedb.T9.a and trafodion.g_hcubedb.CUBE1.b=trafodion.g_hcubedb.T8.b and trafodion.g_hcubedb.CUBE1.c=trafodion.g_hcubedb.T7.c and
trafodion.g_hcubedb.T9.b=trafodion.g_hcubedb.T1.b and trafodion.g_hcubedb.T9.c=trafodion.g_hcubedb.T2.c and
trafodion.g_hcubedb.T8.a=trafodion.g_hcubedb.T3.a and trafodion.g_hcubedb.T8.c=trafodion.g_hcubedb.T4.c and
trafodion.g_hcubedb.T7.a=trafodion.g_hcubedb.T5.a and trafodion.g_hcubedb.T7.b=trafodion.g_hcubedb.T6.a;
--- SQL command prepared.

SQL>explain options 'f' s13;

LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------------------- -------- -------------------- ---------

31 . 32 root 1.00E+000
30 . 31 sort_partial_aggr_ro 1.00E+000
29 . 30 esp_exchange 1:2(hash2) 1.00E+000
28 . 29 sort_partial_aggr_le 1.00E+000
27 2 28 hybrid_hash_join 1.00E+001
26 4 27 hybrid_hash_join 1.00E+001
25 6 26 hybrid_hash_join 1.00E+003
24 8 25 hybrid_hash_join 1.00E+002
23 10 24 hybrid_hash_join 1.00E+002
22 12 23 hybrid_hash_join 1.00E+003
21 14 22 hybrid_hash_join 1.00E+003
20 16 21 hybrid_hash_join 1.00E+002
19 18 20 hybrid_hash_join 1.00E+002
. . 19 trafodion_scan T9 1.00E+004
17 . 18 esp_exchange 2(rep-b):2(hash2) 1.00E+001
. . 17 trafodion_scan T1 1.00E+001
15 . 16 esp_exchange 2(rep-b):2(hash2) 1.00E+001
. . 15 trafodion_scan T2 1.00E+001
13 . 14 esp_exchange 2(hash2):2(hash2) 1.00E+005
. . 13 trafodion_scan CUBE1 1.00E+005
11 . 12 esp_exchange 2(rep-b):2(hash2) 1.00E+002
. . 11 trafodion_scan T7 1.00E+002
9 . 10 esp_exchange 2(rep-b):2(hash2) 1.00E+001
. . 9 trafodion_scan T5 1.00E+001
7 . 8 esp_exchange 2(rep-b):2(hash2) 1.00E+002
. . 7 trafodion_scan T6 1.00E+002
5 . 6 esp_exchange 2(rep-b):2(hash2) 1.00E+003
. . 5 trafodion_scan T8 1.00E+003
3 . 4 esp_exchange 2(rep-b):2(hash2) 1.00E+001
. . 3 trafodion_scan T3 1.00E+001
1 . 2 esp_exchange 2(rep-b):2(hash2) 1.00E+001
. . 1 trafodion_scan T4 1.00E+001

--- SQL operation complete.

SQL>execute s13;
(EXPR)
--------------------
1000

--- 1 row(s) selected.

Tags:

QF Chen (qifan-chen) on 2014-12-04

Changed in trafodion:
status:	New → In Progress
assignee:	nobody → QF Chen (qifan-chen)

Revision history for this message

QF Chen (qifan-chen) wrote on 2014-12-05:

Simplify the problem to the # of rows in index IXCUBE1.

>>set parserflags 1;

--- SQL operation complete.
>>select count(*) from table(index_table IXCUBE1F);

(EXPR)
--------------------

6573

--- 1 row(s) selected.
>>select count(*) from cube1;

(EXPR)
--------------------

13643

--- 1 row(s) selected.

For some reason, the # of rows in index is less!

here is the query plan for upsert which loads the data into CUBE1.

>>explain options 'f' xx;

LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------------------- -------- -------------------- ---------

--- SQL operation complete.

Simplify the problem to the # of rows in index IXCUBE1.

>>set parserflags 1;

--- SQL operation complete.
>>select count(*) from table(index_table IXCUBE1F);

(EXPR)
--------------------

6573

--- 1 row(s) selected.
>>select count(*) from cube1;

(EXPR)
--------------------

13643

--- 1 row(s) selected.

For some reason, the # of rows in index is less!

here is the query plan for upsert which loads the data into CUBE1.

>>explain options 'f' xx;

LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
---- ---- ---- --------------------  --------  --------------------  ---------

18   .    19   root                            x                     3.02E+005
17   .    18   esp_exchange                    1:3(hash2)            3.02E+005
11   16   17   nested_join                                           3.02E+005
14   15   16   merge_union                                           3.00E+000
.    .    15   trafodion_insert                IXCUBE1F              1.00E+000
12   13   14   merge_union                                           2.00E+000
.    .    13   trafodion_insert                IXCUBE1E              1.00E+000
.    .    12   trafodion_insert                IXCUBE1D              1.00E+000
9    10   11   nested_join                                           1.00E+005
.    .    10   trafodion_                      CUBE1                 1.00E+000
8    .    9    sort                                                  1.00E+005
7    2    8    hybrid_hash_join                                      1.00E+005
6    4    7    hybrid_hash_join                                      1.00E+003
5    .    6    esp_exchange                    3(hash2):2(hash2)     1.00E+002
.    .    5    trafodion_scan                  T8                    1.00E+002
3    .    4    esp_exchange                    3(rep-b):2(hash2)     1.00E+001
.    .    3    trafodion_scan                  T1                    1.00E+001
1    .    2    esp_exchange                    3(rep-b):1            1.00E+002
.    .    1    trafodion_index_scan            T6                    1.00E+002

--- SQL operation complete.

Revision history for this message

QF Chen (qifan-chen) wrote on 2014-12-05:

The plan change from base table scan of CUBE1 to index scan of IDXCUBE1F is good, because

1. both are full scan
2. row length of IDXCUBE1F is shorter

Missing row issue is with the population of the indexes.

Revision history for this message

Mike Hanlon (mike-hanlon) wrote on 2014-12-05:

load Edit (1.1 KiB, text/plain)

I reproduced this problem too. The resulting row counts for base table and indexes are all different:

>>set parserflags 1;

--- SQL operation complete.
>>select count(*) from table(index_table IXCUBE1F);

(EXPR)
--------------------

13026

--- 1 row(s) selected.
>>select count(*) from table(index_table IXCUBE1E);

(EXPR)
--------------------

13047

--- 1 row(s) selected.
>>select count(*) from table(index_table IXCUBE1D);

(EXPR)
--------------------

12991

--- 1 row(s) selected.
>>select count(*) from cube1;

(EXPR)
--------------------

50000

--- 1 row(s) selected.

These row counts from the indexes are confirmed by querying hbase directly.

Also, notice base table cube1 has 50,000 rows are the end of the upsert, but we should expect 100,000.

Finally, notice that RMS stats for the query when it finished makes it appear that all 100,000 rows flowed thru the query tree, to each of the indexes. See attached log file.

Also, to reproduce, obey attached files ghcube.ddl and load.

Revision history for this message

Mike Hanlon (mike-hanlon) wrote on 2014-12-05:

ghcube.ddl Edit (3.6 KiB, text/plain)

Revision history for this message

Mike Hanlon (mike-hanlon) wrote on 2014-12-05:

end_stats.log Edit (10.1 KiB, text/plain)

Revision history for this message

Mike Hanlon (mike-hanlon) wrote on 2014-12-08:

Since it takes 2 hours+ to run this, I tried to simplify the test case to use 10% of the rows. No problem with 10% or 33%. It did, reproduce when I loaded 100%. This time, I put "begin work" before the upsert and did the select count(*) after the upsert and before the commit. I saw the same results before the commit and after it:

>>set parserflags 1;

--- SQL operation complete.
>>select count(*) from table(index_table IXCUBE1F);

(EXPR)
--------------------

33675

--- 1 row(s) selected.
>>select count(*) from table(index_table IXCUBE1E);

(EXPR)
--------------------

33703

--- 1 row(s) selected.
>>select count(*) from table(index_table IXCUBE1D);

(EXPR)
--------------------

33675

--- 1 row(s) selected.
>>select count(*) from cube1;

(EXPR)
--------------------

35870

--- 1 row(s) selected.
>>commit;

--- SQL operation complete.
>>select count(*) from table(index_table IXCUBE1F);

(EXPR)
--------------------

33675

--- 1 row(s) selected.
>>select count(*) from table(index_table IXCUBE1E);

(EXPR)
--------------------

33703

--- 1 row(s) selected.
>>select count(*) from table(index_table IXCUBE1D);

(EXPR)
--------------------

33675

--- 1 row(s) selected.
>>select count(*) from cube1;

(EXPR)
--------------------

35870

--- 1 row(s) selected.
>>

QF Chen (qifan-chen) on 2014-12-08

Changed in trafodion:
assignee:	QF Chen (qifan-chen) → Mike Hanlon (mike-hanlon)

Sandhya Sundaresan (sandhya-sundaresan) on 2014-12-09

tags:

added: sql-exe
removed: sql-cmp

Revision history for this message

Mike Hanlon (mike-hanlon) wrote on 2014-12-18:

It looks like the fix for 1380734 has also fixed 1396793 .

Changed in trafodion:
status:	In Progress → Fix Committed

Weishiun Tsai (wei-shiun-tsai) on 2014-12-20

Changed in trafodion:
milestone:	none → r1.0

Revision history for this message

Weishiun Tsai (wei-shiun-tsai) wrote on 2015-01-14:

Download full text (6.0 KiB)

This is a problem thought to have been fixed. But on the v0113 build, we are still seeing it. The following output shows the table/index creation and the execution of the query that returns the wrong result. 3 indexes ixcube1d, ixcube1e, and ixcube1f were created and populated. But select count(*) statements from the indexes afterwards showed that ixcube1f and ixcube1e both have 0 rows. The actual query s3 itself is still returning the wrong result. I am reopening this case.

-----------------------------------------------------------------

Here is the table/index creation output:

SQL>create table cube1
(a int not null not droppable,
b int not null not droppable,
c int not null not droppable,
d int, e int, f int, txt char(100),
primary key (a,b,c))
store by primary key salt using 8 partitions;
--- SQL operation complete.

<.....>

SQL>upsert using load into cube1 select t1.a, t6.a, t8.a, t1.a, t6.a, t8.a, 'some text'
from t1, t6, t8 where t8.a < 100;
--- SQL operation complete.

<.....>

SQL>create index ixcube1d on cube1(d)
attribute extent (30, 30), maxextents 400
hash partition
no populate
;
--- SQL operation complete.

SQL>create index ixcube1e on cube1(e)
attribute extent (30, 30), maxextents 400
hash partition
no populate
;
--- SQL operation complete.

SQL>create index ixcube1f on cube1(f)
attribute extent (30, 30), maxextents 400
hash partition
no populate
;
--- SQL operation complete.

SQL>populate index ixcube1d on cube1;
--- SQL operation complete.

SQL>populate index ixcube1e on cube1;
--- SQL operation complete.

SQL>populate index ixcube1f on cube1;
--- SQL operation complete.

-----------------------------------------------------------------

Here is execution output showing that the query still returns the wrong result, and 2 of the indexes have 0 rows:

>>set schema g_hcubedb;

--- SQL operation complete.
>>
>>prepare s13 from select count(*) from trafodion.g_hcubedb.T1,trafodion.g_hcubedb.T2,trafodion.g_hcubedb.T3,trafodion.g_hcubedb.T4,trafodion.g_hcubedb.T5,trafodion.g_hcubedb.T6,trafodion.g_hcubedb.T7,trafodion.g_hcubedb.T8,trafodion.g_hcubedb.T9,trafodion.g_hcubedb.CUBE1 where
+> trafodion.g_hcubedb.CUBE1.a=trafodion.g_hcubedb.T9.a and trafodion.g_hcubedb.CUBE1.b=trafodion.g_hcubedb.T8.b and trafodion.g_hcubedb.CUBE1.c=trafodion.g_hcubedb.T7.c and
+> trafodion.g_hcubedb.T9.b=trafodion.g_hcubedb.T1.b and trafodion.g_hcubedb.T9.c=trafodion.g_hcubedb.T2.c and
+> trafodion.g_hcubedb.T8.a=trafodion.g_hcubedb.T3.a and trafodion.g_hcubedb.T8.c=trafodion.g_hcubedb.T4.c and
+> trafodion.g_hcubedb.T7.a=trafodion.g_hcubedb.T5.a and trafodion.g_hcubedb.T7.b=trafodion.g_hcubedb.T6.a;

*** WARNING[6008] Statistics for column (A) from table TRAFODION.G_HCUBEDB.CUBE1 were not available. As a result, the access path chosen might not be the best possible.

*** WARNING[6008] Statistics for column (B) from table TRAFODION.G_HCUBEDB.CUBE1 were not available. As a result, the access path chosen might not be the best possible.

*** WARNING[6008] Statistics for column (C) from table TRAFODION.G_HCUBEDB.CUBE1 were not available. As a result, the access path chosen might not be the best possible.

--- SQL command prepared.
>>
>>expla...

This is a problem thought to have been fixed.  But on the v0113 build, we are still seeing it.  The following output shows the table/index creation and the execution of the query that returns the wrong result.   3 indexes ixcube1d, ixcube1e, and ixcube1f were created and populated.  But select count(*) statements from the indexes afterwards showed that ixcube1f and ixcube1e both have 0 rows.  The actual query s3 itself is still returning the wrong result.  I am reopening this case.