Getting TM error 97 when our tables split or get moved.

Bug #1274651 reported by Guy Groulx
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Trafodion
Fix Released
Critical
John de Roo

Bug Description

Testing with transaction enabled.
Our system is using HortonWorks and is running HBASE 0.94 across 12 nodes.
Our hbase max store size was 1GB.

We noticed during loading of large tables, that we would get error 97 returned from the TM and that the batch of rows were not added.

Turns out that our table was being split and that the TM is not handling this at the moment.
We also found out that after a split, the hbase balancer would move the new region to another region server. When this happened, we got more error 97.

WORKAROUND:
- We changed the MAX STORE SIZE to 100GB.
- We changed the SPLIT POLICY to org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy which causes split to happen only once the max size if reached. Default for HBASE 0.94 and up is new POWERof2 policy which causes splits more often.
- We turned off HBASE BALANCER via hbase shell.

Tags: dtm
Guy Groulx (guy-groulx)
tags: added: transaction
John de Roo (john-deroo)
Changed in trafodion:
status: New → In Progress
assignee: nobody → John de Roo (john-deroo)
Revision history for this message
John de Roo (john-deroo) wrote :

Revision 39062 Partial fix merged from seatrans_2 to datalake_64 branch: Added code to region close code to suspend region splits until there are no active transactions in the region. This can be configured through an environment variable, default is on. The code already prohibits region splits while transactions are in phase 2 to ensure DB consistency.

tags: added: dtm
removed: transaction
Guy Groulx (guy-groulx)
Changed in trafodion:
importance: High → Critical
information type: Proprietary → Public
Changed in trafodion:
milestone: none → r1.0
Changed in trafodion:
milestone: r1.0 → r1.1
Revision history for this message
Guy Groulx (guy-groulx) wrote :

January 28th. During a longevity test, client got:
1>----------------------------------------------------------
1> Unexpected EXCEPTION : *** ERROR[8448] Unable to access Hbase interface. Call to ExpHbaseInterface::nextRow returned error HBASE_ACCESS_ERROR(-705). Cause:
java.util.concurrent.ExecutionException: java.io.IOException: PerformScan error on coprocessor call, scannerID: 3082879
java.util.concurrent.FutureTask.report(FutureTask.java:122)
java.util.concurrent.FutureTask.get(FutureTask.java:188)
org.trafodion.sql.HBaseAccess.HTableClient.fetchRows(HTableClient.java:419)
. [2015-01-28 19:02:59]
1> Stream Number : 1
1>----------------------------------------------------------

102>----------------------------------------------------------
102> Unexpected EXCEPTION : *** ERROR[8448] Unable to access Hbase interface. Call to ExpHbaseInterface::nextRow returned error HBASE_ACCESS_ERROR(-705). Cause:
java.util.concurrent.ExecutionException: java.io.IOException: PerformScan error on coprocessor call, scannerID: 0
java.util.concurrent.FutureTask.report(FutureTask.java:122)
java.util.concurrent.FutureTask.get(FutureTask.java:188)
org.trafodion.sql.HBaseAccess.HTableClient.fetchRows(HTableClient.java:419)
. [2015-01-28 19:03:00]
102> Stream Number : 102
102>----------------------------------------------------------

Regionserver log and trafodion.dtm.log files shows that a split occurred on a table.
Zookeeper recovery information was updated with a region and does not get cleared.

Other clients continued to work.

Revision history for this message
Oliver Bucaojit (oliver-bucaojit) wrote :

Our latest changes to set the closing flag seemed to have fixed the latest split issues that we have been seeing. I also have not heard of any new problems related to this bug so I’ll go ahead and mark it as resolved.

The closing flag change was checked into the stable 1.0.1 branch on Feb17 and into mainline on Feb14 so it is in the released code.

Changed in trafodion:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.