ExpHbaseInterface::nextRow sees OutOfOrderScannerNextException on hbase 0.98
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Trafodion |
Fix Released
|
Critical
|
Atanu Mishra |
Bug Description
When running QA regression tests, ExpHbaseInterfa
SQL>Insert Into BTA1P001
Values (
'ABAA', 0, 0, 'CAAAAAAA', -- (0)
68, 'AAAA', 1, 0,
2, 'AA', 2,
11, 3, 'BCAAHAAAAAAAAA
11,
.0, .2, 'ABAA', .6, .0,
0, 8, 8, 'AA', 68, -- (5)
6, 'CBAAAAAA', 6,
1, .1,
'BIAAAAABAAAAAAAA', 1.1, 11, 626,
'BCAAHAAAAAAAAAAA', .0,
'BIAAAAAB', 8, 8, 10,
'ABAA', 10, .00011, .00011 , 11,
'CA', 626,
'AB', 6, 12,
'ABAA', 1968, 468, 69, 1, 9, 9,
'CB', 1, .03,
'ABAA', 11, .01, .06, 6.26,
'BIAAAAAB', 1968, 8,
'CBAAAAAAAAAAAAAA', 68, 2369,
'CBAAAAAAAAAAAAAA', 18, 18, 3,
'AAAAAAAA', 1, 1.1, 1.1,20,
'ABAAAAAAAAAAAA
)
;
*** ERROR[8448] Unable to access Hbase interface. Call to ExpHbaseInterfa
java.util.
java.util.
java.util.
org.trafodion.
. [2014-10-05 21:47:01]
*** ERROR[4082] Object TRAFODION.
This is seen on the v1004 build installed on clusters. It may require some new hbase settings so that Trafodion would work more smoothly on hbase 0.98. At this moment, more research is needed to figure out what these settings are.
This is failing our tests quite frequently and quite randomly, so a critical bug report is created to track this problem.
Changed in trafodion: | |
assignee: | nobody → khaled Bouaziz (khaled-bouaziz) |
Changed in trafodion: | |
assignee: | khaled Bouaziz (khaled-bouaziz) → Suresh Subbiah (suresh-subbiah) |
Changed in trafodion: | |
assignee: | Suresh Subbiah (suresh-subbiah) → Atanu Mishra (atanu-mishra) |
milestone: | none → r0.9 |
tags: |
added: dtm removed: sql-exe |
tags: |
added: hbase removed: dtm |
Changed in trafodion: | |
status: | In Progress → Triaged |
milestone: | r0.9 → none |
From https:/ /issues. apache. org/jira/ browse/ HBASE-11295
So I think I figured out the problem. It is that when a scan request takes too long to process the RPC connection times out. It is not a client timeout issue as there are retries form the client, and it seems like when another RPC connection is reestablished the nextCallSeq information on the client side is lost. Increasing RPC timeout and decreasing scanner caching both work but they also impose performance penalty so I am working to find a way around that.
For Trafodion we could try with these 2 settings in hbase-site.xml
hbase.client. scanner. timeout. period
300000
hbase.rpc.timeout
300000
Also this cqd should help
cqd hbase_num_ cache_rows_ max '1000' ; -- we could also try 5000 here, current default is 10,000