We have code which accepts Swift container as a storage of job binaries, here:
https://github.com/openstack/sahara/blob/93b79027acd659bf007cdd7f5f79785c0c86adf3/sahara/service/edp/binary_retrievers/internal_swift.py#L54
It is expected that we will take all the files inside container as job binaries. But the code does not work. The following exception occurs at a later stage:
2014-04-11 13:33:39.680 61774 TRACE sahara.context Traceback (most recent call last):
2014-04-11 13:33:39.680 61774 TRACE sahara.context File "/Users/dmitryme/projects/elastic-hadoop/sahara/sahara/context.py", line 124, in _wrapper
2014-04-11 13:33:39.680 61774 TRACE sahara.context func(*args, **kwargs)
2014-04-11 13:33:39.680 61774 TRACE sahara.context File "/Users/dmitryme/projects/elastic-hadoop/sahara/sahara/service/edp/job_manager.py", line 144, in run_job
2014-04-11 13:33:39.680 61774 TRACE sahara.context upload_job_files(oozie_server, wf_dir, job, hdfs_user)
2014-04-11 13:33:39.680 61774 TRACE sahara.context File "/Users/dmitryme/projects/elastic-hadoop/sahara/sahara/service/edp/job_manager.py", line 194, in upload_job_files
2014-04-11 13:33:39.680 61774 TRACE sahara.context hdfs_user)
2014-04-11 13:33:39.680 61774 TRACE sahara.context File "/Users/dmitryme/projects/elastic-hadoop/sahara/sahara/service/edp/hdfs_helper.py", line 30, in put_file_to_hdfs
2014-04-11 13:33:39.680 61774 TRACE sahara.context r.write_file_to('/tmp/%s' % file_name, file)
2014-04-11 13:33:39.680 61774 TRACE sahara.context File "/Users/dmitryme/projects/elastic-hadoop/sahara/sahara/utils/ssh_remote.py", line 371, in write_file_to
2014-04-11 13:33:39.680 61774 TRACE sahara.context self._run_s(_write_file_to, timeout, remote_file, data, run_as_root)
2014-04-11 13:33:39.680 61774 TRACE sahara.context File "/Users/dmitryme/projects/elastic-hadoop/sahara/sahara/utils/ssh_remote.py", line 428, in _run_s
2014-04-11 13:33:39.680 61774 TRACE sahara.context return self._run_with_log(func, timeout, *args, **kwargs)
2014-04-11 13:33:39.680 61774 TRACE sahara.context File "/Users/dmitryme/projects/elastic-hadoop/sahara/sahara/utils/ssh_remote.py", line 334, in _run_with_log
2014-04-11 13:33:39.680 61774 TRACE sahara.context return self._run(func, *args, **kwargs)
2014-04-11 13:33:39.680 61774 TRACE sahara.context File "/Users/dmitryme/projects/elastic-hadoop/sahara/sahara/utils/ssh_remote.py", line 425, in _run
2014-04-11 13:33:39.680 61774 TRACE sahara.context return procutils.run_in_subprocess(self.proc, func, args, kwargs)
2014-04-11 13:33:39.680 61774 TRACE sahara.context File "/Users/dmitryme/projects/elastic-hadoop/sahara/sahara/utils/procutils.py", line 52, in run_in_subprocess
2014-04-11 13:33:39.680 61774 TRACE sahara.context raise SubprocessException(result['exception'])
2014-04-11 13:33:39.680 61774 TRACE sahara.context SubprocessException: TypeError: Expected unicode or bytes, got {'edp-lib.jar': '<binary data here>'}
I think we should consider enhancing getting binaries from container to allow getting all binaries with some prefix.
There is a secondary issue here, too, which maybe can be handled in the same bug. If the path does not exist, Sahara gets an exception when it tries to make the HEAD request to swift. An invalid path should be handled and the job execution should be moved to "KILLED".