IllegalArgumentException when accessing Swift object with name containing space character

Bug #1618252 reported by Steve Yang
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Invalid
Medium
Unassigned
Sahara
Triaged
Medium
Unassigned

Bug Description

We are using Spark and hadoop-openstack-2.6.0.jar (compile('org.apache.hadoop:hadoop-openstack:2.6.0')) to access Oracle Storage Service which is Swift-based:

DataFrame df = hiveCtx.read().format("com.databricks.spark.csv").option(...).load(objectName);

When accessing a Swift URL like "swift://Linda.oracleswift/non-matching records.csv" where the object name "non-matching records.csv" contains a space character, the following exception is thrown:

2016-08-23 15:56:03 DEBUG SwiftNativeFileSystem:126 - SwiftFileSystem initialized
java.lang.IllegalArgumentException: Illegal character in path at index 13: /non-matching records.csv
at java.net.URI.create(URI.java:859)
at org.apache.hadoop.fs.swift.util.SwiftObjectPath.<init>(SwiftObjectPath.java:59)
at org.apache.hadoop.fs.swift.util.SwiftObjectPath.fromPath(SwiftObjectPath.java:183)
at org.apache.hadoop.fs.swift.util.SwiftObjectPath.fromPath(SwiftObjectPath.java:145)
at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.toObjectPath(SwiftNativeFileSystemStore.java:434)
at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.getObjectMetadata(SwiftNativeFileSystemStore.java:211)
at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.getObjectMetadata(SwiftNativeFileSystemStore.java:181)
at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem.getFileStatus(SwiftNativeFileSystem.java:173)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:64)
at org.apache.hadoop.fs.Globber.doGlob(Globber.java:272)
at org.apache.hadoop.fs.Globber.glob(Globber.java:151)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1653)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:259)
...

Apparently it is complaining about the space character. However, checking the debug messages earlier before this error is raised we can see:

2016-08-23 15:56:03 DEBUG SwiftNativeFileSystem:122 - Initializing SwiftNativeFileSystem against URI swift://Linda.oracleswift/non-matching%20records.csv and working dir swift://Linda.oracleswift/user/syang
2016-08-23 15:56:03 DEBUG RestClientBindings:141 - Filesystem swift://Linda.oracleswift/non-matching%20records.csv is using configuration keys fs.swift.service.oracleswift
...

The space character has already been encoded into "%20" and so it seems the Swift URL enters into SwiftNativeFileSystem is properly encoded.
Because of this error any Swift object with file name contains space character (and may be slash '/' character as well?) cannot be accessed.

As an additional data point, if we first encode the object name("non-matching records.csv"=>"non-matching%20records.csv") before giving it to OpenStack Swift API, a different error is raised. This time somehow the path separator '/' after the container name 'Linda' got encoded by SwiftNativeFileSystemStore:

2016-08-23 10:56:41 DEBUG SwiftRestClient:1731 - Status code = 400
2016-08-23 10:56:41 DEBUG SwiftRestClient:1445 - Method HEAD on https://storage.oraclecorp.com/v1/Storage-dfisher/Linda%2Fnon-matching%20records.csv failed, status code: 400, status line: HTTP/1.1 400 Bad Request
BadRequest: Bad request against https://storage.oraclecorp.com/v1/Storage-dfisher/Linda%2Fnon-matching%20records.csv HEAD https://storage.oraclecorp.com/v1/Storage-dfisher/Linda%2Fnon-matching%20records.csv => 400
at org.apache.hadoop.fs.swift.http.SwiftRestClient.buildException(SwiftRestClient.java:1456)
at org.apache.hadoop.fs.swift.http.SwiftRestClient.perform(SwiftRestClient.java:1403)
at org.apache.hadoop.fs.swift.http.SwiftRestClient.headRequest(SwiftRestClient.java:1016)
at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.stat(SwiftNativeFileSystemStore.java:257)
at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.getObjectMetadata(SwiftNativeFileSystemStore.java:212)
at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.getObjectMetadata(SwiftNativeFileSystemStore.java:181)
at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem.getFileStatus(SwiftNativeFileSystem.java:173)

So here it always error out whether the Swift object name is URL-encoded or not.

Revision history for this message
clayg (clay-gerrard) wrote :

As I'm sure you've already noted, there's nothing complicated about having an object with a space in the name stored in Swift - simply url encode the object when you make the request and it "just works" [1]

The issue must be in the handling in org.apache.hadoop.fs.swift - but I'm not sure how to file bugs against that project? Maybe on their jira? [2]

Then again, hrmm... maybe - you might should double check you have all the latest fixes following the swift-hadoop integration guide [3]

I *think* this should be invalid on Swift, but I'm going to tag it for Sahara for now and see what that team thinks?

1. https://gist.github.com/clayg/c414c632c7ec10aff8f3db6bb042740b
2. https://issues.apache.org/jira/browse/HADOOP-12079?jql=project%20%3D%20HADOOP%20AND%20text%20~%20%22swift%22
3. http://docs.openstack.org/developer/sahara/userdoc/hadoop-swift.html

Changed in swift:
status: New → Incomplete
importance: Undecided → Medium
Revision history for this message
clayg (clay-gerrard) wrote :
Revision history for this message
Tim Burke (1-tim-z) wrote :

The IllegalArgumentException makes sense, but the 400 seems curious. I wonder what the body of the response had to say?

Maybe there needs to be more logging around https://github.com/apache/hadoop/blob/release-2.6.0/hadoop-tools/hadoop-openstack/src/main/java/org/apache/hadoop/fs/swift/http/SwiftRestClient.java#L1445 ?

Revision history for this message
Vitalii Gridnev (vgridnev) wrote :

There are two ways which we can follow to fix this issue:

1. Propose commit to github.com/openstack/sahara-extra and then fix that. This library is actually a fork of upstream hadoop-openstack, which was done to support V3 Keystone auth.
2. Fix that issue in upstream of hadoop-openstack, but I'm not so familiar with process of fixing issues in upstream of hadoop.

About 1, sahara team is always welcome for those who wants to contribute new fixes, so if some additional help is needed, you find that help at openstack-sahara IRC channel.

About the bug, I think it makes sense with Medium prio.

Changed in sahara:
importance: Undecided → Medium
Revision history for this message
Steve Yang (syang97) wrote :

This issue seems to belong to Apache Commons fs/swift client code. Please let me know if I got it wrong. Hadoop issue raised for it: https://issues.apache.org/jira/browse/HADOOP-13618

Changed in sahara:
status: New → Triaged
Tim Burke (1-tim-z)
Changed in swift:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.