select on hive table hanging via dcs
Bug #1329361 reported by
Guy Groulx
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Trafodion |
Fix Committed
|
Critical
|
Suresh Subbiah |
Bug Description
Created hive.hive.
From sqlci, select * from hive.hive.
But from traci, the same select is simply hanging.
Changed in trafodion: | |
assignee: | nobody → Suresh Subbiah (suresh-subbiah) |
status: | New → In Progress |
Changed in trafodion: | |
status: | Invalid → New |
Changed in trafodion: | |
milestone: | none → r1.1 |
Changed in trafodion: | |
status: | New → In Progress |
Changed in trafodion: | |
status: | In Progress → Fix Committed |
To post a comment you must log in.
The problem happens when HiveMetadaData object tries to do an initConnection with a JNI call. The hang occurs on the java side and has this trace found by Hans and Arvind. 0f5b000 nid=0x50be in Object.wait() [0x00007fffe57d e000] lang.Thread. State: WAITING (on object monitor) Object. wait(Native Method) 9d40> (a java.lang. UNIXProcess) Object. wait(Object. java:503) UNIXProcess. waitFor( UNIXProcess. java:210) 9d40> (a java.lang. UNIXProcess) hadoop. util.Shell. runCommand( Shell.java: 250) hadoop. util.Shell. run(Shell. java:188) hadoop. util.Shell$ ShellCommandExe cutor.execute( Shell.java: 381) hadoop. util.Shell. execCommand( Shell.java: 467) hadoop. util.Shell. execCommand( Shell.java: 450) hadoop. security. ShellBasedUnixG roupsMapping. getUnixGroups( ShellBasedUnixG roupsMapping. java:86) hadoop. security. ShellBasedUnixG roupsMapping. getGroups( ShellBasedUnixG roupsMapping. java:55) hadoop. security. JniBasedUnixGro upsMappingWithF allback. getGroups( JniBasedUnixGro upsMappingWithF allback. java:50) hadoop. security. Groups. getGroups( Groups. java:95) hadoop. security. UserGroupInform ation.getGroupN ames(UserGroupI nformation. java:1292) d448> (a org.apache. hadoop. security. UserGroupInform ation) hadoop. hive.metastore. HiveMetaStoreCl ient.open( HiveMetaStoreCl ient.java: 293) hadoop. hive.metastore. HiveMetaStoreCl ient.<init> (HiveMetaStoreC lient.java: 163) sql.HBaseAccess .HiveClient. init(HiveClient .java:85)
Jstack 20514
…
"main" prio=10 tid=0x000000000
java.
at java.lang.
- waiting on <0x00000000f72d
at java.lang.
at java.lang.
- locked <0x00000000f72d
at org.apache.
at org.apache.
at org.apache.
at org.apache.
at org.apache.
at org.apache.
at org.apache.
at org.apache.
at org.apache.
at org.apache.
- locked <0x00000000fd57
at org.apache.
at org.apache.
at org.trafodion.
…
home/squser2> ps -aef | grep 20514 cluster: 2181,n003. cm.cluster: 2181,n004. cm.cluster: 2181 -RZ g4q0985. houston. hp.com: 175 -ZkPnode /squser2 -CNGTO 60 -ZKSTO 180 -EADSCO 0 -TCPADD 16.235.163.238
squser2 20514 18924 0 22:17 ? 00:00:09 mxosrvr -ZkHost n002.cm.
squser2 37031 20514 0 22:26 ? 00:00:00 [id] <defunct>
The java layer in mxosrvr fires up a bash shell to to determine the Unix groups for the current user with this command rver. It is not clear if this problem could be related. The next step is to check if this problem can be reproduced on a workstation.
bash -c id -Gn. Somehow this command never comes back when the process is mxosrvr. It works fine from sqlci. On Spinel we are using a HiveMetastoreSe