Incorrect results or error 8442 from hive tables when using query cache and hive data changes

Bug #1396386 reported by Hans Zeller
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Trafodion
In Progress
Medium
khaled Bouaziz

Bug Description

Trafodion caches Hive metadata, Hive table statistics and entire queries on Hive tables. The HDFS files to read for a query of a Hive table are contained in the query plan. When the underlying data changes, Trafodion doesn't have a good way to detect those changes, therefore we return stale data. In some cases we also see an error like this one, when HDFS files are removed:

*** ERROR[8442] Unable to access HDFS interface. Call to ExpLOBInterfaceSelectCursor/open returned error LOB_DATA_FILE_OPEN_ERROR(508). Error detail 0.

There are several possible solutions to this problem:

1. Leave it up to the user to turn off query caching - current solution. Also, the HIVE_METADATA_REFRESH_INTERVAL can be used to control caching of HDFS file statistics.

2. Disable the query cache for queries that access Hive data.

3. Improve validation methods for query and metadata caches to detect changes of the underlying HDFS files.

Test case:

-- using the Hive shell, create Hive table T1 and enter some data
select * from hive.hive.T1;
-- now update the Hive table, e.g. add more data
select * from hive.hive.T1;
-- the changes will not be seen, we get the cached data
cqd query_cache '0';
select * from hive.hive.T1;
-- now the changes are reflected in the result

Tags: sql-cmp
Revision history for this message
Hans Zeller (hans-zeller) wrote :

Once this is fixed, we can remove the following from trafodion/core/sql/regress/hive/TEST018:

cqd query_cache '0';

Changed in trafodion:
assignee: nobody → khaled Bouaziz (khaled-bouaziz)
status: New → In Progress
description: updated
Revision history for this message
Trafodion-Gerrit (neo-devtools) wrote : Fix merged to core (master)

Reviewed: https://review.trafodion.org/734
Committed: https://github.com/trafodion/core/commit/b6fc0094fdc567785265f75250a060fa835fa5eb
Submitter: Trafodion Jenkins
Branch: master

commit b6fc0094fdc567785265f75250a060fa835fa5eb
Author: Khaled Bouaziz <email address hidden>
Date: Tue Nov 25 22:51:26 2014 +0000

    fix for hive/test018

    Query caching seems to cause hive/test018 to fail
    disabling it for now till the query caching
    issue is fixed

    bug 1396386

    Change-Id: Ife834050ce8f7d7b968488f5a5ff3f189ac2b666

Changed in trafodion:
status: In Progress → Fix Committed
Changed in trafodion:
status: Fix Committed → In Progress
Revision history for this message
khaled Bouaziz (khaled-bouaziz) wrote :

option 3 seems to be the best
3. Improve validation methods for query and metadata caches to detect changes of the underlying HDFS files.

Test case:

-- using the Hive shell, create Hive table T1 and enter some data
select * from hive.hive.T1;
-- now update the Hive table, e.g. add more data
select * from hive.hive.T1;
-- the changes will not be seen, we get the cached data
cqd query_cache '0';
select * from hive.hive.T1;
-- now the changes are reflected in the result

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.