The problem happens when HiveMetadaData object tries to do an initConnection with a JNI call. The hang occurs on the java side and has this trace found by Hans and Arvind.
Jstack 20514
…
"main" prio=10 tid=0x0000000000f5b000 nid=0x50be in Object.wait() [0x00007fffe57de000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000f72d9d40> (a java.lang.UNIXProcess)
at java.lang.Object.wait(Object.java:503)
at java.lang.UNIXProcess.waitFor(UNIXProcess.java:210)
- locked <0x00000000f72d9d40> (a java.lang.UNIXProcess)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:250)
at org.apache.hadoop.util.Shell.run(Shell.java:188)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:381)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:467)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:450)
at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:86)
at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:55)
at org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.getGroups(JniBasedUnixGroupsMappingWithFallback.java:50)
at org.apache.hadoop.security.Groups.getGroups(Groups.java:95)
at org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1292)
- locked <0x00000000fd57d448> (a org.apache.hadoop.security.UserGroupInformation)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:293)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:163)
at org.trafodion.sql.HBaseAccess.HiveClient.init(HiveClient.java:85)
…
The java layer in mxosrvr fires up a bash shell to to determine the Unix groups for the current user with this command
bash -c id -Gn. Somehow this command never comes back when the process is mxosrvr. It works fine from sqlci. On Spinel we are using a HiveMetastoreServer. It is not clear if this problem could be related. The next step is to check if this problem can be reproduced on a workstation.
The problem happens when HiveMetadaData object tries to do an initConnection with a JNI call. The hang occurs on the java side and has this trace found by Hans and Arvind. 0f5b000 nid=0x50be in Object.wait() [0x00007fffe57d e000] lang.Thread. State: WAITING (on object monitor) Object. wait(Native Method) 9d40> (a java.lang. UNIXProcess) Object. wait(Object. java:503) UNIXProcess. waitFor( UNIXProcess. java:210) 9d40> (a java.lang. UNIXProcess) hadoop. util.Shell. runCommand( Shell.java: 250) hadoop. util.Shell. run(Shell. java:188) hadoop. util.Shell$ ShellCommandExe cutor.execute( Shell.java: 381) hadoop. util.Shell. execCommand( Shell.java: 467) hadoop. util.Shell. execCommand( Shell.java: 450) hadoop. security. ShellBasedUnixG roupsMapping. getUnixGroups( ShellBasedUnixG roupsMapping. java:86) hadoop. security. ShellBasedUnixG roupsMapping. getGroups( ShellBasedUnixG roupsMapping. java:55) hadoop. security. JniBasedUnixGro upsMappingWithF allback. getGroups( JniBasedUnixGro upsMappingWithF allback. java:50) hadoop. security. Groups. getGroups( Groups. java:95) hadoop. security. UserGroupInform ation.getGroupN ames(UserGroupI nformation. java:1292) d448> (a org.apache. hadoop. security. UserGroupInform ation) hadoop. hive.metastore. HiveMetaStoreCl ient.open( HiveMetaStoreCl ient.java: 293) hadoop. hive.metastore. HiveMetaStoreCl ient.<init> (HiveMetaStoreC lient.java: 163) sql.HBaseAccess .HiveClient. init(HiveClient .java:85)
Jstack 20514
…
"main" prio=10 tid=0x000000000
java.
at java.lang.
- waiting on <0x00000000f72d
at java.lang.
at java.lang.
- locked <0x00000000f72d
at org.apache.
at org.apache.
at org.apache.
at org.apache.
at org.apache.
at org.apache.
at org.apache.
at org.apache.
at org.apache.
at org.apache.
- locked <0x00000000fd57
at org.apache.
at org.apache.
at org.trafodion.
…
home/squser2> ps -aef | grep 20514 cluster: 2181,n003. cm.cluster: 2181,n004. cm.cluster: 2181 -RZ g4q0985. houston. hp.com: 175 -ZkPnode /squser2 -CNGTO 60 -ZKSTO 180 -EADSCO 0 -TCPADD 16.235.163.238
squser2 20514 18924 0 22:17 ? 00:00:09 mxosrvr -ZkHost n002.cm.
squser2 37031 20514 0 22:26 ? 00:00:00 [id] <defunct>
The java layer in mxosrvr fires up a bash shell to to determine the Unix groups for the current user with this command rver. It is not clear if this problem could be related. The next step is to check if this problem can be reproduced on a workstation.
bash -c id -Gn. Somehow this command never comes back when the process is mxosrvr. It works fine from sqlci. On Spinel we are using a HiveMetastoreSe