ENDTRANSACTION hang, transaction state FORGETTING
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Trafodion |
In Progress
|
High
|
John de Roo |
Bug Description
A loop to reexecute the seabase developer regression suite hung on the 14th iteration in TEST016. The sqlci console looked like this:
>>-- char type
>>create table mcStatPart1
+>(a int not null not droppable,
+>b char(10) not null not droppable,
+>f int, txt char(100),
+>primary key (a,b))
+>salt using 8 partitions ;
--- SQL operation complete.
>>
>>insert into mcStatPart1 values (1,'123'
+> (3,'123'
A pstack of the sqlci (0,13231) showed it blocking in a call to ENDTRANSACTION. And dtmci showed this for the transaction:
DTMCI > list
Transid Owner eventQ pending Joiners TSEs State
(0,13742) 0,13231 0 0 0 0 FORGETTING
Here's a copy of Sean's analysis:
From: Broeder, Sean
Sent: Wednesday, January 21, 2015 8:43 AM
To: Hanlon, Mike; Cooper, Joanie
Cc: DeRoo, John
Subject: RE: ENDTRANSACTION hang, transaction state FORGETTING
Hi Mike,
It looks like we have a zookeeper problem right at the time of the commit. A table is offline:
2015-01-21 11:13:45,529 WARN zookeeper.ZKUtil: hconnection-
org.apache.
Then we fail after 3 retries of sending the commit request
2015-01-21 11:14:04,405 ERROR transactional.
2015-01-21 11:14:04,405 ERROR transactional.
Normally we would create a recovery entry for this transaction to redrive commit, but it appears we are unable to do that due to the zookeeper errors
2015-01-21 11:14:04,408 DEBUG client.
471340 2015-01-21 11:14:05,255 WARN zookeeper.
471341 2015-01-21 11:14:05,256 WARN zookeeper.
471342 2015-01-21 11:14:05,256 INFO util.RetryCounter: Sleeping 1000ms before retry #0...
471343 2015-01-21 11:14:05,256 INFO util.RetryCounter: Sleeping 1000ms before retry #0...
Hbase looks like it’s having troubles as I can’t even do a list operation from the hbase shell
2015-01-21 14:40:28,816 ERROR [main] client.
We need to think of how better to handle this in the TransactionManager, but in reality I’m not sure what we can do if Zookeeper fails. You can open an LP bug so we have record of it and can discuss what to do.
Thanks,
Sean
_______
From: Hanlon, Mike
Sent: Wednesday, January 21, 2015 6:17 AM
To: Cooper, Joanie
Cc: Broeder, Sean; DeRoo, John
Subject: ENDTRANSACTION hang, transaction state FORGETTING
Hi Joanie,
Have we seen this before? A SQL regression test (in this case seabase/TEST016) hangs in a call to ENDTRANSACTION. The transaction state is shown in dtmci to be FORGETTING. It probably is not easy to reproduce, since the problem occurred on the 14th iteration of a loop to re-execute the seabase suite.
There are a lot of messages in /opt/home/
thanks
Mike
Changed in trafodion: | |
milestone: | r1.1 → r2.0 |
status: | New → In Progress |
This needs to be thought through.