Sahara + swift + DLO : file not found

Bug #1639819 reported by Frédéric Gaudet
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Sahara
New
Undecided
Unassigned

Bug Description

I'm using Liberty. I use the sahara 4.0.0 from the centos7 repo. This is a vanilla installation, except with the hadoop-swift jar file : I downloaded a quite recent version since I had a previous issue fixed with this release (Socket timeout while accessing swift). When I try to process a large file from swift, I have a FileNotFound exception.

Step to reproduce :
-------------------

In swift, upload a big file splitted into severals 1GB segment. I use the following argument with my jar file :

Input source file = swift://2mass.sahara/2mass.csv

the saveAsHadoop method should download all segments in 2mass_segments container as specified in the 2mass.csv manifest, but java exits with the following error : Exception in thread "main" java.io.FileNotFoundException: Not Found
swift://2mass.sahara/2mass.csv/1476283376.280434/62949101353/1073741824/00000000

It seems that the path should be :
swift://2mass_segments.sahara/2mass.csv/1476283376.280434/62949101353/1073741824/00000000

Informations :
--------------

The hadoop-swift MD5 :
ubuntu@xmatch-fgaudet-ngt-sparkmaster-0:/usr/lib/hadoop$ md5sum hadoop-swift.jar
cbe1478523ba79497bb41cf22c7f9c79 hadoop-swift.jar

Content of my swift container :
--------------------------------

ubuntu@xmatch-fgaudet-ngt-sparkmaster-0:/tmp/spark-edp/XMatch/2fe16654-58b8-4df2-b0f9-d9f4ab1a942c$ swift list
2mass
2mass_segments

ubuntu@xmatch-fgaudet-ngt-sparkmaster-0:/tmp/spark-edp/XMatch/2fe16654-58b8-4df2-b0f9-d9f4ab1a942c$ swift list --lh 2mass
   0 2016-10-26 08:03:48 None 2mass.csv
   0

Header :
-------

ubuntu@xmatch-fgaudet-ngt-sparkmaster-0:/tmp/spark-edp/XMatch/2fe16654-58b8-4df2-b0f9-d9f4ab1a942c$ swift stat --lh 2mass 2mass.csv
       Account: v1
     Container: 2mass
        Object: 2mass.csv
  Content Type: binary/octet-stream
Content Length: 58G
 Last Modified: Wed, 26 Oct 2016 08:03:48 GMT
          ETag: "c1a3b941c9a1a41e0347bd50672215ad"
      Manifest: 2mass_segments/2mass.csv/1476283376.280434/62949101353/1073741824/
    Meta Mtime: 1476283376.280434
 Accept-Ranges: bytes
   X-Timestamp: 1477469028.93125
    X-Trans-Id: tx000000000000000003c91-0058208895-10cc84-default

Spark command line (as sahara build it):
---------------------------------------

ubuntu@xmatch-fgaudet-ngt-sparkmaster-0:/tmp/spark-edp/XMatch/2fe16654-58b8-4df2-b0f9-d9f4ab1a942c$ more launch_command.log
2016-11-07 13:49:13,329 INFO Running /opt/spark/bin/spark-submit --driver-class-path /usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/hadoop-swift.jar: --files spark.xml --class org
.openstack.sahara.edp.SparkWrapper --jars builtin-4aa76035-7cc1-404d-ac34-966cc03fef73.jar --master spark://xmatch-fgaudet-ngt-sparkmaster-0:7077 --deploy-mode client cds.xmatch.spark.jar spark.x
ml cds.xmatch.spark.PreprocessData swift://2mass.sahara/2mass.csv RA 4 5 4096 200 swift://2massoutput.sahara/2mass.rdd

cat stderr :
------------

<---- SNIP --->

16/11/07 13:49:18 INFO scheduler.DAGScheduler: Job 0 failed: saveAsHadoopFile at PreprocessData.java:89, took 0.277151 s
Exception in thread "main" java.io.FileNotFoundException: Not Found swift://2mass.sahara/2mass.csv/1476283376.280434/62949101353/1073741824/00000000
 at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.getObjectMetadata(SwiftNativeFileSystemStore.java:240)
 at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.listDirectory(SwiftNativeFileSystemStore.java:559)
 at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.listSegments(SwiftNativeFileSystemStore.java:1213)
 at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem.getFileBlockLocations(SwiftNativeFileSystem.java:238)
 at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:332)
 at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

<--- SNIP --->

Thanks everyone

Revision history for this message
Vitalii Gridnev (vgridnev) wrote :
Revision history for this message
Frédéric Gaudet (frgaudet) wrote :

Yes, it seems so.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.