Comment 1 for bug 1329361

Revision history for this message
Suresh Subbiah (suresh-subbiah) wrote :

The problem happens when HiveMetadaData object tries to do an initConnection with a JNI call. The hang occurs on the java side and has this trace found by Hans and Arvind.
Jstack 20514

"main" prio=10 tid=0x0000000000f5b000 nid=0x50be in Object.wait() [0x00007fffe57de000]
   java.lang.Thread.State: WAITING (on object monitor)
                at java.lang.Object.wait(Native Method)
                - waiting on <0x00000000f72d9d40> (a java.lang.UNIXProcess)
                at java.lang.Object.wait(Object.java:503)
                at java.lang.UNIXProcess.waitFor(UNIXProcess.java:210)
                - locked <0x00000000f72d9d40> (a java.lang.UNIXProcess)
                at org.apache.hadoop.util.Shell.runCommand(Shell.java:250)
                at org.apache.hadoop.util.Shell.run(Shell.java:188)
                at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:381)
                at org.apache.hadoop.util.Shell.execCommand(Shell.java:467)
                at org.apache.hadoop.util.Shell.execCommand(Shell.java:450)
                at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:86)
                at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:55)
                at org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.getGroups(JniBasedUnixGroupsMappingWithFallback.java:50)
                at org.apache.hadoop.security.Groups.getGroups(Groups.java:95)
                at org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1292)
                - locked <0x00000000fd57d448> (a org.apache.hadoop.security.UserGroupInformation)
                at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:293)
                at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:163)
                at org.trafodion.sql.HBaseAccess.HiveClient.init(HiveClient.java:85)

home/squser2> ps -aef | grep 20514
squser2 20514 18924 0 22:17 ? 00:00:09 mxosrvr -ZkHost n002.cm.cluster:2181,n003.cm.cluster:2181,n004.cm.cluster:2181 -RZ g4q0985.houston.hp.com:175 -ZkPnode /squser2 -CNGTO 60 -ZKSTO 180 -EADSCO 0 -TCPADD 16.235.163.238
squser2 37031 20514 0 22:26 ? 00:00:00 [id] <defunct>

The java layer in mxosrvr fires up a bash shell to to determine the Unix groups for the current user with this command
bash -c id -Gn. Somehow this command never comes back when the process is mxosrvr. It works fine from sqlci. On Spinel we are using a HiveMetastoreServer. It is not clear if this problem could be related. The next step is to check if this problem can be reproduced on a workstation.