HBase-trx TrxRegionEndpoint starts a transaction for a performScan operation on a scanner that has already been closed

Bug #1439421 reported by Joanie Cooper
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Trafodion
Fix Committed
High
Joanie Cooper

Bug Description

A TrxRegionEndpoint "performScan" protobuf call is received after a previous open scanner has been closed and the transaction identifier has been retired. The subsequent"performScan" operation is for the same region that has successfully completed a full protocol series of a TrxRegion "openScanner/performScan/closeScanner" protobuf operations and have retired the transaction.

As the transaction has been retired, the TrxRegionEndpoint coprocessor will create a new transactionState object and try to find a scanner object.

As the previous scanner has been closed, a null is returned.

Previously, we were also on a higher scanner number, e.g. 23 for this table.
The performScan is attempting a scan operation on a scanner with a scanner id of “0”.

hbase-cmf-hbase-REGIONSERVER-centos-ah2.hpl.hp.com.log.out:2015-03-09 09:29:21,434 TRACE org.apache.hadoop.hbase.coprocessor.transactional.TrxRegionEndpoint: TrxRegionEndpoint coprocessor: performScan - txId 136052, scanner id 0, numberOfRows 2614, nextCallSeq 0, closeScanner is false, region is TRAFODION.BENCH56.OE_ORDERLINE_12,\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1425744440518.c435085cabe98dbb4e31420a47ad95c4.

hbase-cmf-hbase-REGIONSERVER-centos-ah2.hpl.hp.com.log.out:2015-03-09 09:29:21,434 TRACE org.apache.hadoop.hbase.coprocessor.transactional.TrxRegionEndpoint: TrxRegionEndpoint coprocessor: performScan - txId 136052, performScan rsh is null, UnknownScannerException for scannerId: 0, nextCallSeq was 1, for region TRAFODION.BENCH56.OE_ORDERLINE_12,\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1425744440518.c435085cabe98dbb4e31420a47ad95c4

The client does receive the "performScan" -8448 error for the query.

We should be able to determine that we have successfully sent all rows for the scan and not allow asubsequent "performScan" to start a new transaction state object in the TrxRegionEndpoint coprocess that errors out immediately.

Analysis is being done to determine why the additional "performScan" is being sent.

Changed in trafodion:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Joanie Cooper (joanie-cooper)
Revision history for this message
Joanie Cooper (joanie-cooper) wrote :

A new test was added to the "performScan" and "closeScanner" protobuf calls in the regionserver to confirm that the supplied transaction identifier is an active transaction. If it is not active, then an "openScanner" protobuf call had not been previously called. A new OutOfOrderProtocolException can be optionally returned should there be no active transaction. The exception is optional based on the flow from SQL that a prefetch scan operation is called after the regular scan operation has completed and the scanner has been closed and the transaction retired. In this scenario, if a subsequent "performScan" or "closeScanner" operation is performed, then a warning could be posted and a positive return code with no additional rows for SQL to process could be indicated. The default is to return the OutOfOrderProtocolException.

Changed in trafodion:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.