How to use Python, Pandas and Scikit-learn for Kaggle challenges

Technology|Data Science|Machine Learning|Data Preparation| August 05, 2015| PierreG

There is still a bit less than one month to compete in the Caterpillar Tube Pricing Kaggle challenge. In this competition, players are asked to infer the price of tubes from different suppliers.

Since the data is rather small and in a simple format, this challenge is a perfect way to start using python and its two most used packages: pandas and scikit-learn.

Join us at the next New York Big Data Workshop!

Henri Dwyer (Data Scientist at Dataiku) and myself will animate the next New York Big Data Workshop about this on 14th of August. Feel free to join here.

Start to train on iPython Notebook

We created for you a iPython notebook example on how to load files, reshape data and create your first model to be able to submit on Kaggle. You can also download the complete iPython Notebook file here.

    iPython Notebook example

Any question about this blog post? Just send me an email and we’ll discuss it :)

Receive success story

Please fill out the form below to receive the success story by email:

Contact us

How can we come back to you ?