hbase split starvation due to transactions.

Bug #1449190 reported by Guy Groulx
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Trafodion
In Progress
High
Oliver Bucaojit

Bug Description

We ran a longevity test on a system. Running OE with 512 drivers.
Our max hfile was set to 10GB.

After a while it was noticed in some of the hbase regionserver logs
2015-04-27 10:35:06,990 INFO [regionserver60020-splits-1430121725808] transactional.TrxRegionObserver: Delaying split due to transactions present. Delayed : 153 minute(s) on TRAFODION.JAVABENCH.OE_ORDERLINE_512,\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1430066791987.a1e39f281243d24c45d615c1b950f2a8.
2015-04-27 10:35:13,926 INFO [regionserver60020-splits-1430123472882] transactional.TrxRegionObserver: Delaying split due to transactions present. Delayed : 124 minute(s) on TRAFODION.JAVABENCH.OE_ORDERLINE_512,\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1430066791987.a1e39f281243d24c45d615c1b950f2a8.

Looking at the hdfs GUI:
Contents of directory /apps/hbase/data/data/default/TRAFODION.JAVABENCH.OE_ORDERLINE_512/a1e39f281243d24c45d615c1b950f2a8/04ae7ce619d24b0094d85d5c39ebf8a6 file 72.49 MB
559232b70b5340ddaa289a30dc4d7d2c file 14.66 GB <== This is over 14.66GB.
6c253a61ee344b1bb39d2f3a669103d3 file 72.56 MB
8837fc13d3a241d493b3ddcbd160d869 file 72.49 MB
901c5708daa048599de8d1441ed5ea89 file 72.48 MB
b9d4bb1179414f9686f1f3271a2b434b file 72.56 MB
bbe2057994194a2693897bb5323a89fd file 72.56 MB

Notice how the 2nd entry is over 10GB. It can't split because we have active transactions. And because our 512 drivers are not letting up, the split is starving out.

Once we killed the drivers, stopping new transactions, the split happened almost instantly.
Hall, Gary winding down...
 1:48 PM
2015-04-27 17:48:57,235 INFO [regionserver60020-splits-1430135811050] regionserver.SplitRequest: Region split, hbase:meta updated, and report to master. Parent=TRAFODION.JAVABENCH.OE_ORDERLINE_512,\x00\x00\x00\x0F\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1430066791987.2fb76848eeb9b1516ae7a80500e8870c., new regions: TRAFODION.JAVABENCH.OE_ORDERLINE_512,\x00\x00\x00\x0F\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1430135811140.4cd059f95ecba06425a5c39592de4157., TRAFODION.JAVABENCH.OE_ORDERLINE_512,\x00\x00\x00\x0F\x80\x00\x00\xE8\x80\x00\x00\x05\x80\x00\x000\x80\x00\x00\x07,1430135811140.62d1d89a34607dfde0ec66d18ad6e91f.. Split took 5hrs, 52mins, 6sec

Above says 5hr 52 mins but it actually took less than a minute once the transactions stopped.

We understand that split must be delayed until transactions have stopped, but in a high transaction environments, we need to make sure that a window will be given for the splits to actually happen.

Tags: dtm
Revision history for this message
Atanu Mishra (atanu-mishra) wrote :
Changed in trafodion:
milestone: none → r2.0
assignee: nobody → Oliver Bucaojit (oliver-bucaojit)
Changed in trafodion:
status: New → In Progress
Revision history for this message
Oliver Bucaojit (oliver-bucaojit) wrote :

I have added additional split configuration properties that can be used to change the behavior of the preSplit operation. This allows customization from the user to choose the preferred behavior. A few changes that may help with the split delay are to check more often for transactions to not be present (every 15 seconds vs 60) and always delaying a split if pending transactions are present (keeps database consistent, instead of having to wait for recovery thread to handle).

In the longer term, I am working on how to handle the split and balance operations without having to delay the operation. Design for that is still in progress and can be found in the link above from Atanu.

Configuration properties for the server-side split delay behavior, set the properties in hbase-site.xml --
https://rndwiki.corp.hpecorp.net/confluence/display/seaquestbigdata/TM+Configuration+Properties

hbase.transaction.split.drain.early
If 'true' then split operation will not wait on active transactions to complete. Will wait on pending transactions.
Default is 'false'

hbase.transaction.split.delay.limit
Sets the configurable region split delay limit
When the delay limit is surpassed, the HBase splits will no longer be blocked from the TM perspective. HBase may still have delays due to GC or other operations.
Default will be 360, which is a 6 hour delay max.

hbase.transaction.split.active.delay
Sets time in milliseconds that the preSplit observer will poll for the active transaction list to be empty.
Default is 15000 (15 seconds)

hbase.transaction.split.pending.delay
Sets time in milliseconds that the preSplit observer will poll for the pending transaction list to be empty.
Default is 500 (0.5 seconds)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.