Update stats gets wrong row counts with the first run after loading data
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Trafodion |
In Progress
|
High
|
Barry Fritchman |
Bug Description
This problem has been seen several times in the past few builds, the latest one being the v1209_0830 build. When QA populates the g_tpch2x tables on QA clusters, the scripts load the data, do select count(*) on each table, and then run update stats on each table. The select count(*) prior to the update stats show the correct row counts. But showstats after update stats show that the row count in the stats for each table is way off. Some tables get more row counts than the actual row counts and some less.
Rerunning the same set of update stats statements often correct this situation. But this is causing a huge problem for testing. When the stats are this bad, some of the larger queries using these tables would lapse back to nested join and would then hang for 20 hours without finishing.
This problem can’t be reliably reproduced on all clusters, but it does show up frequently enough to cause problems. This case is created to document this problem. Investigation needs to be done in the implementation of update stats to see why select count(*) gets the correct counts while update stats afterwards does not.
=======
Here is the execution output of the select count(*) statements and the update stats statements, in that order, after the data loading.
SQL>select count(*) from region;
(EXPR)
-------
--- 1 row(s) selected.
SQL>select count(*) from nation;
(EXPR)
-------
--- 1 row(s) selected.
SQL>select count(*) from supplier;
(EXPR)
-------
--- 1 row(s) selected.
SQL>select count(*) from customer;
(EXPR)
-------
--- 1 row(s) selected.
SQL>select count(*) from part;
(EXPR)
-------
--- 1 row(s) selected.
SQL>select count(*) from partsupp;
(EXPR)
-------
--- 1 row(s) selected.
SQL>select count(*) from orders;
(EXPR)
-------
--- 1 row(s) selected.
SQL>select count(*) from lineitem;
(EXPR)
-------
--- 1 row(s) selected.
-------
== TEST: tcase.test003
-------
SQL>update statistics for table region on every column;
--- SQL operation complete.
SQL>update statistics for table nation on every column;
--- SQL operation complete.
SQL>update statistics for table supplier on every column;
--- SQL operation complete.
SQL>update statistics for table customer on every column;
--- SQL operation complete.
SQL>update statistics for table part on every column;
--- SQL operation complete.
SQL>update statistics for table partsupp on every column;
--- SQL operation complete.
SQL>update statistics for table orders on every column sample random 10 percent;
--- SQL operation complete.
SQL>update statistics for table lineitem on every column sample random 10 percent;
--- SQL operation complete.
=======
Here is the showstats output for each table after update stats statements were first run. The row counts are completely off.
>>set schema g_tpch2x;
--- SQL operation complete.
>>showstats for table CUSTOMER on every column;
Histogram data for Table TRAFODION.
Table ID: 314741800727389741
Hist ID # Ints Rowcount UEC Colname(s)
========== ====== =========== =========== =======
1392898339 48 1100767 1100767 C_CUSTKEY
1392898334 62 1100767 1100767 C_NAME
1392898329 62 1100767 1100767 C_ADDRESS
1392898324 25 1100767 0 C_NATIONKEY
1392898319 62 1100767 1100767 C_PHONE
1392898314 36 1100767 6 C_ACCTBAL
1392898309 5 1100767 0 C_MKTSEGMENT
1392898304 62 1100767 0 C_COMMENT
1392898299 8 1100767 0 "_SALT_"
1392898344 1 1100767 1100767 "_SALT_", C_CUSTKEY
--- SQL operation complete.
>>showstats for table LINEITEM on every column;
Histogram data for Table TRAFODION.
Table ID: 314741800727390211
Hist ID # Ints Rowcount UEC Colname(s)
========== ====== =========== =========== =======
1537555901 28 12720858 2555961 L_ORDERKEY
1537555896 52 12720858 399694 L_PARTKEY
1537555891 48 12720858 20000 L_SUPPKEY
1537555886 7 12720858 7 L_LINENUMBER
1537555881 50 12720858 50 L_QUANTITY
1537555876 39 12720858 939869 L_EXTENDEDPRICE
1537555871 11 12720858 11 L_DISCOUNT
1537555866 9 12720858 9 L_TAX
1537555861 3 12720858 3 L_RETURNFLAG
1537555856 2 12720858 2 L_LINESTATUS
1537555851 50 12720858 2526 L_SHIPDATE
1537555846 50 12720858 2466 L_COMMITDATE
1537555841 50 12720858 2550 L_RECEIPTDATE
1537555836 4 12720858 4 L_SHIPINSTRUCT
1537555831 7 12720858 7 L_SHIPMODE
1537555826 62 12720858 7447670 L_COMMENT
1537555821 8 12720858 8 "_SALT_"
1537555906 1 12720858 12720858 "_SALT_", L_SHIPDATE, L_ORDERKEY, L_LINENUMBER
--- SQL operation complete.
>>showstats for table NATION on every column;
Histogram data for Table TRAFODION.
Table ID: 314741800727389515
Hist ID # Ints Rowcount UEC Colname(s)
========== ====== =========== =========== =======
1374739566 25 25 25 N_NATIONKEY
1374739561 25 25 25 N_NAME
1374739556 5 25 5 N_REGIONKEY
1374739551 25 25 25 N_COMMENT
1374739546 8 25 8 "_SALT_"
1374739571 1 25 25 "_SALT_", N_NATIONKEY
--- SQL operation complete.
>>showstats for table ORDERS on every column;
Histogram data for Table TRAFODION.
Table ID: 314741800727390091
Hist ID # Ints Rowcount UEC Colname(s)
========== ====== =========== =========== =======
1464778717 42 4585155 4585155 O_ORDERKEY
1464778712 56 4585155 193098 O_CUSTKEY
1464778707 3 4585155 3 O_ORDERSTATUS
1464778702 57 4585155 4045135 O_TOTALPRICE
1464778697 50 4585155 2406 O_ORDERDATE
1464778692 5 4585155 5 O_ORDERPRIORITY
1464778687 62 4585155 2000 O_CLERK
1464778682 1 4585155 1 O_SHIPPRIORITY
1464778677 62 4585155 4259632 O_COMMENT
1464778672 8 4585155 8 "_SALT_"
1464778722 1 4585155 4585155 "_SALT_", O_ORDERKEY
--- SQL operation complete.
>>showstats for table PART on every column;
Histogram data for Table TRAFODION.
Table ID: 314741800727389857
Hist ID # Ints Rowcount UEC Colname(s)
========== ====== =========== =========== =======
1403671365 48 1725243 1725243 P_PARTKEY
1403671360 62 1725243 1279978 P_NAME
1403671355 5 1725243 0 P_MFGR
1403671350 25 1725243 0 P_BRAND
1403671345 150 1725243 0 P_TYPE
1403671340 50 1725243 0 P_SIZE
1403671335 40 1725243 0 P_CONTAINER
1403671330 55 1725243 0 P_RETAILPRICE
1403671325 76 1725243 0 P_COMMENT
1403671320 8 1725243 0 "_SALT_"
1403671370 1 1725243 1725243 "_SALT_", P_PARTKEY
--- SQL operation complete.
>>showstats for table PARTSUPP on every column;
Histogram data for Table TRAFODION.
Table ID: 314741800727389978
Hist ID # Ints Rowcount UEC Colname(s)
========== ====== =========== =========== =======
1433360836 48 2972264 0 PS_PARTKEY
1433360831 48 2972264 0 PS_SUPPKEY
1433360826 48 2972264 0 PS_AVAILQTY
1433360821 48 2972264 0 PS_SUPPLYCOST
1433360816 62 2972264 2205231 PS_COMMENT
1433360811 8 2972264 0 "_SALT_"
1433360841 1 2972264 2972264 "_SALT_", PS_PARTKEY, PS_SUPPKEY
--- SQL operation complete.
>>showstats for table REGION on every column;
Histogram data for Table TRAFODION.
Table ID: 314741800727389399
Hist ID # Ints Rowcount UEC Colname(s)
========== ====== =========== =========== =======
1369726849 5 5 5 R_REGIONKEY
1369726844 5 5 5 R_NAME
1369726839 5 5 5 R_COMMENT
1369726834 3 5 3 "_SALT_"
1369726854 1 5 5 "_SALT_", R_REGIONKEY
--- SQL operation complete.
>>showstats for table SUPPLIER on every column;
Histogram data for Table TRAFODION.
Table ID: 314741800727389626
Hist ID # Ints Rowcount UEC Colname(s)
========== ====== =========== =========== =======
1380401507 48 64035 64035 S_SUPPKEY
1380401502 62 64035 64035 S_NAME
1380401497 62 64035 64035 S_ADDRESS
1380401492 25 64035 0 S_NATIONKEY
1380401487 62 64035 64035 S_PHONE
1380401482 56 64035 0 S_ACCTBAL
1380401477 62 64035 47479 S_COMMENT
1380401472 8 64035 0 "_SALT_"
1380401512 1 64035 64035 "_SALT_", S_SUPPKEY
--- SQL operation complete.
=======
Here is the showstats output after rerunning the same set of update stats statements. The row counts are now correct.
>>set schema g_tpch2x;
--- SQL operation complete.
>>showstats for table CUSTOMER on every column;
Histogram data for Table TRAFODION.
Table ID: 314741800727389741
Hist ID # Ints Rowcount UEC Colname(s)
========== ====== =========== =========== =======
1392898338 48 300000 300000 C_CUSTKEY
1392898335 62 300000 300000 C_NAME
1392898328 62 300000 300000 C_ADDRESS
1392898325 25 300000 25 C_NATIONKEY
1392898318 62 300000 300000 C_PHONE
1392898315 36 300000 262499 C_ACCTBAL
1392898308 5 300000 5 C_MKTSEGMENT
1392898305 62 300000 299263 C_COMMENT
1392898298 8 300000 8 "_SALT_"
1392898345 1 300000 300000 "_SALT_", C_CUSTKEY
--- SQL operation complete.
>>showstats for table LINEITEM on every column;
Histogram data for Table TRAFODION.
Table ID: 314741800727390211
Hist ID # Ints Rowcount UEC Colname(s)
========== ====== =========== =========== =======
1537555900 27 11997996 2536382 L_ORDERKEY
1537555897 60 11997996 399669 L_PARTKEY
1537555890 48 11997996 20000 L_SUPPKEY
1537555887 7 11997996 7 L_LINENUMBER
1537555880 50 11997996 50 L_QUANTITY
1537555877 40 11997996 940199 L_EXTENDEDPRICE
1537555870 11 11997996 11 L_DISCOUNT
1537555867 9 11997996 9 L_TAX
1537555860 3 11997996 3 L_RETURNFLAG
1537555857 2 11997996 2 L_LINESTATUS
1537555850 50 11997996 2526 L_SHIPDATE
1537555847 50 11997996 2466 L_COMMITDATE
1537555840 50 11997996 2549 L_RECEIPTDATE
1537555837 4 11997996 4 L_SHIPINSTRUCT
1537555830 7 11997996 7 L_SHIPMODE
1537555827 62 11997996 7101582 L_COMMENT
1537555820 8 11997996 8 "_SALT_"
1537555907 1 11997996 11997996 "_SALT_", L_SHIPDATE, L_ORDERKEY, L_LINENUMBER
--- SQL operation complete.
>>showstats for table NATION on every column;
Histogram data for Table TRAFODION.
Table ID: 314741800727389515
Hist ID # Ints Rowcount UEC Colname(s)
========== ====== =========== =========== =======
1374739567 25 25 25 N_NATIONKEY
1374739560 25 25 25 N_NAME
1374739557 5 25 5 N_REGIONKEY
1374739550 25 25 25 N_COMMENT
1374739547 8 25 8 "_SALT_"
1374739570 1 25 25 "_SALT_", N_NATIONKEY
--- SQL operation complete.
>>showstats for table ORDERS on every column;
Histogram data for Table TRAFODION.
Table ID: 314741800727390091
Hist ID # Ints Rowcount UEC Colname(s)
========== ====== =========== =========== =======
1464778716 30 3000000 3000000 O_ORDERKEY
1464778713 59 3000000 191492 O_CUSTKEY
1464778706 3 3000000 3 O_ORDERSTATUS
1464778703 53 3000000 2759999 O_TOTALPRICE
1464778696 50 3000000 2406 O_ORDERDATE
1464778693 5 3000000 5 O_ORDERPRIORITY
1464778686 62 3000000 2000 O_CLERK
1464778683 1 3000000 1 O_SHIPPRIORITY
1464778676 62 3000000 2898572 O_COMMENT
1464778673 8 3000000 8 "_SALT_"
1464778723 1 3000000 3000000 "_SALT_", O_ORDERKEY
--- SQL operation complete.
>>showstats for table PART on every column;
Histogram data for Table TRAFODION.
Table ID: 314741800727389857
Hist ID # Ints Rowcount UEC Colname(s)
========== ====== =========== =========== =======
1403671364 48 400000 400000 P_PARTKEY
1403671361 62 400000 399982 P_NAME
1403671354 5 400000 5 P_MFGR
1403671351 25 400000 25 P_BRAND
1403671344 150 400000 150 P_TYPE
1403671341 50 400000 50 P_SIZE
1403671334 40 400000 40 P_CONTAINER
1403671331 55 400000 22097 P_RETAILPRICE
1403671324 76 400000 69074 P_COMMENT
1403671321 8 400000 8 "_SALT_"
1403671371 1 400000 400000 "_SALT_", P_PARTKEY
--- SQL operation complete.
>>showstats for table PARTSUPP on every column;
Histogram data for Table TRAFODION.
Table ID: 314741800727389978
Hist ID # Ints Rowcount UEC Colname(s)
========== ====== =========== =========== =======
1433360837 48 1600000 400000 PS_PARTKEY
1433360830 48 1600000 20000 PS_SUPPKEY
1433360827 48 1600000 9999 PS_AVAILQTY
1433360820 48 1600000 99901 PS_SUPPLYCOST
1433360817 62 1600000 1599985 PS_COMMENT
1433360810 8 1600000 8 "_SALT_"
1433360840 1 1600000 1600000 "_SALT_", PS_PARTKEY, PS_SUPPKEY
--- SQL operation complete.
>>showstats for table REGION on every column;
Histogram data for Table TRAFODION.
Table ID: 314741800727389399
Hist ID # Ints Rowcount UEC Colname(s)
========== ====== =========== =========== =======
1369726848 5 5 5 R_REGIONKEY
1369726845 5 5 5 R_NAME
1369726838 5 5 5 R_COMMENT
1369726835 3 5 3 "_SALT_"
1369726855 1 5 5 "_SALT_", R_REGIONKEY
--- SQL operation complete.
>>showstats for table SUPPLIER on every column;
Histogram data for Table TRAFODION.
Table ID: 314741800727389626
Hist ID # Ints Rowcount UEC Colname(s)
========== ====== =========== =========== =======
1380401506 48 20000 20000 S_SUPPKEY
1380401503 62 20000 20000 S_NAME
1380401496 62 20000 20000 S_ADDRESS
1380401493 25 20000 25 S_NATIONKEY
1380401486 62 20000 20000 S_PHONE
1380401483 56 20000 19803 S_ACCTBAL
1380401476 62 20000 19972 S_COMMENT
1380401473 8 20000 8 "_SALT_"
1380401513 1 20000 20000 "_SALT_", S_SUPPKEY
--- SQL operation complete.
Changed in trafodion: | |
milestone: | none → r1.0 |
Changed in trafodion: | |
milestone: | r1.0 → r1.1 |
Changed in trafodion: | |
milestone: | r1.1 → r2.0 |
Barry, Doesn't update stats gets the count through an internal interface to hbase/hdfs?
Are there times when that may not be correct if some data has not been flushed or some
other reason?
Assigning to Barry.