Comment 3 for bug 1593663

Revision history for this message
Saverio Proto (zioproto) wrote :

Hello,

I retested with the Hortonworks distribution.

Using the jars:
/usr/hdp/2.4.3.0-227/hadoop-mapreduce/hadoop-openstack-2.7.1.2.4.3.0-227.jar
/usr/hdp/2.4.3.0-227/hadoop-mapreduce/hadoop-streaming-2.7.1.2.4.3.0-227.jar

I cannot reproduce the bug.

But, if I compile the hadoop-openstack jar from the Sahara repository, then I can reproduce the bug.

The code here works:
https://github.com/hortonworks/hadoop-release/tree/HDP-2.5.2.1-tag/hadoop-tools/hadoop-openstack

But the code here DO NOT work:
https://github.com/openstack/sahara-extra/tree/master/hadoop-swiftfs

ubuntu@ambari1:~/hadoop-swift-tutorial$ sudo -u hdfs -i hadoop jar /usr/hdp/2.4.3.0-227/hadoop-mapreduce/hadoop-streaming-2.7.1.2.4.3.0-227.jar -input swift://googlebooks-ngrams-gz-swift.switchengines/eng/oglebooks-eng-all-0gram-20120701-a.gz -output /switch/testnumber4 -mapper mapper-ngrams.py -reducer reducer-ngrams.py -file /home/ubuntu/hadoop-swift-tutorial/mapper-ngrams.py -file /home/ubuntu/hadoop-swift-tutorial/reducer-ngrams.py -numReduceTasks 1
WARNING: Use "yarn jar" to launch YARN applications.
16/12/05 17:04:48 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [/home/ubuntu/hadoop-swift-tutorial/mapper-ngrams.py, /home/ubuntu/hadoop-swift-tutorial/reducer-ngrams.py] [/usr/hdp/2.4.3.0-227/hadoop-mapreduce/hadoop-streaming-2.7.1.2.4.3.0-227.jar] /tmp/streamjob1748730503238342006.jar tmpDir=null
16/12/05 17:04:49 INFO impl.TimelineClientImpl: Timeline service address: http://ambari2:8188/ws/v1/timeline/
16/12/05 17:04:49 INFO client.RMProxy: Connecting to ResourceManager at ambari2/10.0.192.10:8050
16/12/05 17:04:49 INFO impl.TimelineClientImpl: Timeline service address: http://ambari2:8188/ws/v1/timeline/
16/12/05 17:04:49 INFO client.RMProxy: Connecting to ResourceManager at ambari2/10.0.192.10:8050
16/12/05 17:04:49 ERROR streaming.StreamJob: Error Launching job : Output directory hdfs://ambari1:8020/switch/testnumber4 already exists
Streaming Command Failed!
ubuntu@ambari1:~/hadoop-swift-tutorial$ sudo -u hdfs -i hadoop jar /usr/hdp/2.4.3.0-227/hadoop-mapreduce/hadoop-streaming-2.7.1.2.4.3.0-227.jar -input swift://googlebooks-ngrams-gz-swift.switchengines/eng/googlebooks-eng-all-0gram-20120701-a.gz -output /switch/testnumber5 -mapper mapper-ngrams.py -reducer reducer-ngrams.py -file /home/ubuntu/hadoop-swift-tutorial/mapper-ngrams.py -file /home/ubuntu/hadoop-swift-tutorial/reducer-ngrams.py -numReduceTasks 1
WARNING: Use "yarn jar" to launch YARN applications.
16/12/05 17:05:33 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [/home/ubuntu/hadoop-swift-tutorial/mapper-ngrams.py, /home/ubuntu/hadoop-swift-tutorial/reducer-ngrams.py] [/usr/hdp/2.4.3.0-227/hadoop-mapreduce/hadoop-streaming-2.7.1.2.4.3.0-227.jar] /tmp/streamjob3486478259414753576.jar tmpDir=null
16/12/05 17:05:34 INFO impl.TimelineClientImpl: Timeline service address: http://ambari2:8188/ws/v1/timeline/
16/12/05 17:05:34 INFO client.RMProxy: Connecting to ResourceManager at ambari2/10.0.192.10:8050
16/12/05 17:05:35 INFO impl.TimelineClientImpl: Timeline service address: http://ambari2:8188/ws/v1/timeline/
16/12/05 17:05:35 INFO client.RMProxy: Connecting to ResourceManager at ambari2/10.0.192.10:8050
16/12/05 17:05:38 INFO mapred.FileInputFormat: Total input paths to process : 1
16/12/05 17:05:40 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/hdfs/.staging/job_1480696932687_0010
16/12/05 17:05:40 ERROR streaming.StreamJob: Error launching job , bad input path : Not Found swift://googlebooks-ngrams-gz-swift.switchengines/eng/googlebooks-eng-all-0gram-20120701-a.gz/1466054520.469164168/15339202495/00000000
Streaming Command Failed!