As I mentioned in my previous post, I also had a chance to talk about what we've been up to over here at Dataiku during my stay in Berlin:
Any real-life data analysis project is made of a large number of tasks, using various tools (Hadoop MapReduce, Hive, Pig, SQL, Python, R, NoSQL, ...).
Flow brings a whole new approach to the problem of orchestrating and managing these kinds of complex data pipelines.
The people in the "Palais" room were clearly interested by Flow's approach and several persons approached me, anxious to get their hands on Flow. We will make the first betas of Flow available quite soon now. You can register to get informed as soon as we have some stuff ready.
dctc also garnered quite a lot of interest. dctc is a command-line tool to ease the manipulation and transfer of files accross various storage types (Filesystem, HDFS, Amazon S3, Google Cloud Storage, FTP, SCP/SFTP).
You can use it for listing, uploading, downloading, incremental synchronization, dispatch of files in several files, in-cloud edition, and much more.
Everybody who has already had to deal with multiple stages of data transfers will definitely benefit from this tool. Start using dctc right now, at http://dctc.io. Don't hesitate to report any issue you might encounter!
Please fill out the form below to receive the success story by email:
How can we come back to you ?