Incorrect results or error 8442 from hive tables when using query cache and hive data changes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Trafodion |
In Progress
|
Medium
|
khaled Bouaziz |
Bug Description
Trafodion caches Hive metadata, Hive table statistics and entire queries on Hive tables. The HDFS files to read for a query of a Hive table are contained in the query plan. When the underlying data changes, Trafodion doesn't have a good way to detect those changes, therefore we return stale data. In some cases we also see an error like this one, when HDFS files are removed:
*** ERROR[8442] Unable to access HDFS interface. Call to ExpLOBInterface
There are several possible solutions to this problem:
1. Leave it up to the user to turn off query caching - current solution. Also, the HIVE_METADATA_
2. Disable the query cache for queries that access Hive data.
3. Improve validation methods for query and metadata caches to detect changes of the underlying HDFS files.
Test case:
-- using the Hive shell, create Hive table T1 and enter some data
select * from hive.hive.T1;
-- now update the Hive table, e.g. add more data
select * from hive.hive.T1;
-- the changes will not be seen, we get the cached data
cqd query_cache '0';
select * from hive.hive.T1;
-- now the changes are reflected in the result
Changed in trafodion: | |
assignee: | nobody → khaled Bouaziz (khaled-bouaziz) |
status: | New → In Progress |
description: | updated |
Changed in trafodion: | |
status: | Fix Committed → In Progress |
Once this is fixed, we can remove the following from trafodion/ core/sql/ regress/ hive/TEST018:
cqd query_cache '0';